1. Introduction
Fingerprint recognition systems have evolved as a cornerstone of contemporary biometric authentication due to their uniqueness, ease of usage, and broad relevance in protected access control. Nevertheless, these systems are susceptible to presentation attacks (PAs) [
1], where fake replicas such as silicone or gelatin fingerprints are utilized to mislead the sensor. To resolve this, fingerprint spoof detection or presentation attack detection (PAD) [
2] has arisen as a vital area of investigation aimed at determining real (live) fingerprints from fake ones. With the growing sophistication of spoof materials (
Figure 1) and the need for real-time, strong authentication, this work studies the development of fingerprint spoof detection techniques, from conventional machine learning techniques to cutting-edge deep learning and unified models, concentrating on their implementation, adaptability, and generalization to anonymous attacks [
3].
Fingerprint liveness detection has undergone a considerable transition over the past decade, transitioning from handcrafted and ensemble-based approaches to unified, adaptive, and transformer-driven solutions. Initial measures primarily concentrated on using traditional machine learning methods, notably ensemble and incremental learning. Kho et al. [
4] proposed an early method based on incremental learning employing support vector machine (SVM) ensembles. Their strategy presented expert classifiers incrementally trained on new spoof types utilizing the Learn++.NC algorithm, which presents scalability without needing retraining of the whole system. Building on this, Agarwal and Chowdary [
5] presented adaptive ensemble techniques—A-Stacking and A-Bagging—which formed disjoint experts tailored to distinctive data subsets, enhancing robustness on imbalanced datasets. With the growing popularity of deep learning concepts, Jung et al. [
6] proposed a dual-CNN architecture that combines template and probe fingerprints, thereby improving liveness detection while incurring some computational burden. Alshdadi et al. [
7] later developed feature engineering by integrating Level-1 (ridge orientation) and Level-3 (ridge contours) features through a novel descriptor called Quantized Fundamental Fingerprint Features (Q-FFF), resulting in decreased error rates on LivDet datasets. Meanwhile, Sharma and Selwal [
8] contributed an exhaustive review of fingerprint PAD techniques, discussing the development from classical hardware-based approaches to current deep learning-based techniques, along with datasets, protocols, and open challenges.
To additionally lower computational intricacy while preserving high accuracy, Zhang et al. [
9] proposed FLDNet, a lightweight dense CNN with attention pooling. It exhibited robust performance across cross-material, intra-sensor, and cross-sensor evaluations. Nevertheless, generalization to anonymous spoof materials remained a continuous challenge. To address this, Chugh and Jain [
10] presented the Universal Material Generator, a style-transfer-based approach that enhanced generalization by synthesizing new spoof deviations. Moreover, González-Soler et al. [
11] concentrated on encoding both local and global features into a common space, gaining state-of-the-art performance across LivDet 2011–2019 datasets, especially in unfamiliar attack scenarios. Above traditional classification, the domain initiated to investigate few-shot and one-shot learning paradigms to manage data scarcity. Tian et al. [
12] proposed the Coupled Patch Similarity Network, which extracted fine-grained, part-level similarities employing patch-wise attention, applying the groundwork for fine-grained spoof detection with very low data. Tang et al. [
13] expanded this with a bidirectional pyramidal attention mechanism for few-shot fine-grained recognition, additionally improving model sensitivity to slight divergences.
As the demand for adaptability became more critical, Agarwala et al. [
14] developed A-iLearn, an adaptive incremental learning model that maintained knowledge across learning phases without retraining, achieving substantial improvements in performance on evolving spoof materials. Rattani and Ross [
15] presented a novel material detection module that automatically retrained the liveness detector when encountering unknown spoof images, gaining a 46% performance improvement over non-adaptive strategies. Recent research has shifted towards integrating spoof detection directly within recognition systems. A unified model proposed in [
16] exhibited that spoof detection and fingerprint recognition can be simultaneously achieved without degrading performance, reducing memory and computational overhead by up to 50%. This technique was additionally examined by [
17], which presented a simulation framework for examining PAD integration in verification systems under diverse operating conditions. Similarly, ref. [
18] highlighted the constraints of binary evaluation parameters for PAD and proposed modified procedures that mirror the pseudo-ternary nature of spoof detection in real-world biometric systems. The most progressive evolution in this line of research is ViT Unified [
19], which employs a Vision Transformer-based architecture to jointly execute fingerprint recognition and spoof detection. This unified model attains high accuracy while greatly lessening latency and parameter count, marking a milestone in the design of effective and protected biometric systems. The Squeeze-and-Excitation attention mechanism enhances computational efficiency compared to Vision Transformer models, which need high memory and computation due to self-attention across all input patches. SE operates at the channel level, recalibrating feature maps to concentrate on key channels, hence lowering computational overhead. Unlike ViT, which requires large datasets and substantial resources, SE can be combined into existing networks like EfficientNetB0, allowing faster convergence and efficient performance, making TL-Efficient-SE more appropriate for real-time fingerprint liveness detection in resource-constrained environments.
Recent examinations emphasize the effectiveness of attention mechanisms [
20] in progressing fingerprint and biometric recognition. Query2Set presents a single-to-multiple partial fingerprint matching technique that adaptively combines features from distinct partial prints, surpassing traditional fusion and mosaicking techniques [
21]. AFR-Net integrates vision transformers with CNN embeddings, attaining superior recognition across intra-sensor, cross-sensor, and latent-to-rolled datasets, even outperforming commercial systems [
22]. For reconstruction, attention-based and multi-kernel autoencoders restore damaged or incomplete fingerprints with high accuracy (93.81%), enhancing biometric dependability in experimental settings [
23]. Advancing to multimodal systems, an Attention-Based Multimodal Biometric Recognition (AMBR) framework with Federated Learning guarantees privacy-preserving training while gaining low error rates on benchmark datasets [
24]. Concurrently, these works show that attention-driven models improve recognition, reconstruction, and secure multimodal authentication. Prior techniques like Slim-ResCNN [
25] and HyFiPAD [
26] suffer certain limitations that restrict their effectiveness in real-world applications. Slim-ResCNN struggles with cross-sensor generalizability as it is trained on a specific sensor, making it not adaptable to data from other sensors, which restricts its deployment in various environments. Furthermore, HyFiPAD depends on local binary features that are manually crafted, potentially restricting its capability to efficiently capture complex fingerprint textures and adjust to new spoof materials, thereby decreasing its generalization across evolving spoofing methods. These restrictions are overcome by the TL-Efficient-SE framework, which uses EfficientNetB0 for robust feature extraction and combines the Squeeze-and-Excitation mechanism to enhance adaptability and generalization across various spoof materials and sensors. The comparison of the propsoed TL-Efficient-SE with the notable existing works is given in
Table 1. Some studies highlight the effectiveness of Vision Transformers in fingerprint recognition, such as [
27], which achieves high accuracy in contactless fingerprint classification. The work in [
28] introduces the Finger Recovery Transformer (FingerRT) for recovering incomplete fingerprints, improving recognition accuracy. The authors in [
29] focuses on adversarial attacks in multimodal biometric systems, showing input fusion offers better security. The authors in [
30] reviews unimodal and multimodal fingerprint systems, emphasizing fusion and template protection. Previous studies predominantly utilized conventional machine learning techniques; however, contemporary approaches have transitioned to deep learning and attention mechanisms. This development highlights the increasing necessity for adaptive models that can manage various spoofing materials and sensor discrepancies. Conversely, our suggested TL-Efficient-SE framework amalgamates transfer learning with the Squeeze-and-Excitation attention mechanism, providing superior feature extraction and cross-sensor generalization, distinguishing it from conventional techniques. This work proposes a deep learning framework integrating EfficientNetB0 with a Squeeze-and-Excitation (SE) attention mechanism to improve fingerprint liveness detection by enhancing feature extraction and focusing on key fingerprint areas. The major contributions of the proposed work are as follows.
The work proposes a strong deep learning framework integrating EfficientNetB0 and a Squeeze-and-Excitation (SE) attention mechanism to improve feature extraction, enhancing the capability to discriminate live and spoofed fingerprints. The attention mechanism effectively highlights significant fingerprint areas, thereby augmenting classification accuracy.
By using transfer learning with a pre-trained EfficientNetB0 model, the proposed approach facilitates convergence and effectively extracts features from limited training data, providing high performance without needing vast retraining.
The proposed model is tested on the LivDet 2015 dataset across distinct sensors like Green Bit, CrossMatch, and HiScan, consistently delivering high accuracy and adaptability making the model feasible for real-world applications where sensor diversity is common.
Gaining greater than 98.50% accuracy, high AUC, and perfect recall, the model provides accurate liveness detection with fewer false positives or negatives, essential for preserving secure and seamless user experiences in biometric systems.
The SE attention mechanism enhances feature discrimination by concentrating on essential fingerprint details and lessening the influence of insignificant relevant features, thereby greatly enhancing the model’s robustness against spoof attacks.
2. Datasets Description
The LivDet 2015 dataset [
31] is an extensively utilized benchmark for assessing fingerprint liveness detection algorithms and biometric security systems. It comprises live and spoof fingerprint images captured utilizing four different optical fingerprint scanners: Green Bit, Biometrika, Persona, Digital, and Crossmatch. The dataset is split into two primary parts:
Algorithm Testing, which evaluates the performance of software-based liveness detection models, and
System Testing, which assesses whole hardware-integrated fingerprint recognition systems. The individual dataset comprises over 4000 images featuring spoof fingerprints constructed from diverse materials, including Ecoflex, Latex, Play-Doh, Gelatine, and Wood Glue (
Figure 2). Moreover, unknown spoof materials were contained in the test set to estimate the generalization capability of detection models. Live fingerprint images were received from multiple subjects under differing conditions, including normal, wet, dry, high-pressure, and low-pressure environments.
The dataset is created to deliver natural fingerprint spoofing conditions, providing a strong evaluation of anti-spoofing methods. The performance assessment is based on key parameters, including correct classification rates for live and spoof fingerprints (
Fcorrlive and
Fcorrfake), false classification rates (
Ferrlive and
Ferrfake), and failure-to-enroll rates (
Frej). The classification threshold for distinguishing between live and spoof fingerprints was fixed at 50 out of 100. The fixed classification criterion of 55/100 was selected for uniformity; however, it may not be ideal in every instance. Future study will investigate adaptive thresholding, which dynamically adjusts according to prediction scores or sensor attributes, to enhance performance and resilience in practical applications. The System Testing dataset contains fingerprint images from 51 human subjects and spoof attempts employing five distinct spoof materials. The LivDet 2015 competition results exhibit considerable advancement in biometric security, emphasizing the resilience and drawbacks of diverse liveness detection techniques.
Table 2 outlines the fundamental features of the dataset.
The LivDet 2015 dataset is highly suitable for assessing multi-sensor spoof detection due to its inclusion of fingerprint images from diverse optical sensors, such as Green Bit, CrossMatch, and HiScan, under various conditions. It includes both live and spoofed fingerprints, with the spoof images constructed employing materials like Play-Doh, Ecoflex, and gelatin, symbolizing various spoofing strategies. This diversity makes the dataset a perfect standard for evaluating a model’s capability to generalize across numerous sensors and spoof materials, which is critical for real-world applications in multi-sensor biometric systems.
3. Proposed EfficientNetB0 with Attention Mechanism
The proposed framework (
Figure 3) involves data preprocessing (resizing images to
, normalization, with 80:20, 70:15:15, 70:20:10 splits). Feature extraction employs EfficientNetB0 with frozen layers and global average pooling. The Squeeze-and-Excitation mechanism applies channel attention through two fully connected layers. Fully connected layers with batch normalization and dropout are followed by a sigmoid output layer for binary classification. The model is trained with the Adam optimizer and binary cross-entropy loss, evaluated employing metrics like accuracy, AUC, precision, recall, and F1-score.
3.1. Data Preprocessing and Augmentation
The fingerprint dataset employed in this analysis is sourced from the LivDet 2015 competition, especially the CrossMatch sensor, which includes both live and spoofed fingerprint images manufactured employing diverse materials, including Ecoflex, Body Double, and Playdoh.
Individually, the fingerprint image is first loaded and resized to a specified resolution of
pixels to correspond to the input size of EfficientNetB0. Mathematically, considering an input fingerprint image be defined as follows:
where
,
, and
(RGB channels). The selection to resize fingerprint images to
pixels is predicated on compatibility with pre-trained models such as EfficientNetB0, which was trained on the ImageNet dataset utilizing this particular input dimension. Resizing optimizes feature extraction while reducing computational complexity and memory consumption. High-resolution sensors, such as the Biometrika HiScan-PRO (1000 dpi), record intricate fingerprints, but scaling to
pixels enables the model to concentrate on essential features like ridges and minutiae. While certain intricate details may be diminished during resizing, the Squeeze-and-Excitation (SE) attention mechanism within the TL-Efficient-SE framework mitigates this by augmenting the model’s capacity to concentrate on essential features, thereby ensuring resilient performance even with resized images from high-resolution sensors. All the images are then normalized, employing EfficientNetB0’s preprocessing function:
where
and
are mean and standard deviation values from the ImageNet dataset.
To guarantee a balanced dataset, stratified splitting is used:
where class distribution remains uniform across both training and testing datasets.
Equations (
1), (
2), and (
3) describe the input formatting, normalization, and dataset splitting strategy, respectively.
3.2. Pre-Trained Model for Feature Extraction
The architecture is based on EfficientNetB0, a highly optimized convolutional neural network for image feature extraction as shown in
Figure 4. The base feature extractor is set with pre-trained ImageNet weights:
where
denotes the convolutional layers of EfficientNetB0, and
is the extracted feature map with depth
D.
To improve the discriminative ability of the architecture model, we combine an attention mechanism utilizing Squeeze-and-Excitation blocks. The attention mechanism refines the feature representation by computing channel-wise importance scores as follows:
where
denotes the global average-pooled feature for channel
c.
This vector is then passed through a fully connected bottleneck transformation:
where
and
are learnable weights,
r is the reduction ratio (set to 2 in our case), and
is the sigmoid activation function.
Finally, the re-weighted feature maps are obtained by element-wise multiplication:
where
represents the enhanced feature representation with improved discriminatory power. The pseudocode of Deep Feature Extraction (EffB0-Feature) is given in Algorithm 1.
Algorithm 1 Deep Feature Extraction (EffB0-Feature) |
- Require:
Raw fingerprint image set - Ensure:
Feature map set - 1:
Define backbone model: EffNetB0 pre-trained on ImageNet - 2:
Initialize the full model: - 3:
Remove top (dense) layers from the model - 4:
Freeze all convolutional layers in the model to preserve pre-trained weights - 5:
for each image do - 6:
Resize to - 7:
Apply normalization using - 8:
Extract feature map: - 9:
Append to feature map set F - 10:
end for return
|
3.3. Squeeze-and-Excitation (SE) Block–Attention Mechanism
A Squeeze-and-Excitation (SE) block is incorporated into the architecture to augment the discriminative capacity of the derived features. This block executes adaptive recalibration of channel-specific feature responses by explicitly modelling inter-channel interdependencies. The objective is to highlight informative features and diminish less important ones.
TheSE block improves feature discrimination by adaptively recalibrating channel-wise feature responses. The process is as follows.
3.3.1. Squeeze Operation (Global Average Pooling)
Global average pooling is employed to the feature map
F to yield a channel descriptor
for each channel
c:
where
H and
W denote feature map spatial dimensions, and
denotes the feature map values.
3.3.2. Excitation Phase (Fully Connected Layers)
The channel descriptor
is passed through two pointwise convolutions (fully connected layers):
where
denotes the learnable weight matrix,
denotes the ReLU activation function, and
is the bias term.
The second convolution regains dimensionality and applies a sigmoid activation function to produce attention weights:
where
is another learnable weight matrix,
is the sigmoid activation function, and
is the bias term.
In the proposed TL-Efficient-SE framework, the SE block enahnces the model’s capability to focus on essential fingerprint features like ridges and minutiae, improving both accuracy and robustness, especially in handling diverse spoof materials and sensor types.
3.3.3. Recalibration (Element-Wise Multiplication)
The attention weights
s are implemented to the original feature map
F through element-wise multiplication to recalibrate the features:
where
represents the recalibrated feature map with improved discriminatory power.
This mechanism helps the model focus on important features while suppressing less relevant ones, thereby improving feature representation and classification accuracy.
Consider the feature map output from EfficientNetB0 as a 3D tensor specified as follows:
The squeeze operation employs global average pooling to reduce spatial dimensions and generate a channel descriptor:
The excitation phase employs two fully connected layers to describe channel-wise relationships. The initial layer lowers dimensionality and applies a ReLU activation:
The second layer regains dimensionality and applies a sigmoid activation function to produce attention weights:
The weights are subsequently employed to rescale the original feature map by channel-wise multiplication:
where ⊙ represents element-wise multiplication implemented across the channels of the feature map. The detailed pseudocode of EfficientNetB0 with Squeeze-and-Excitation Attention (EffB0-SE) is given in Algorithm 2.
Algorithm 2 EfficientNetB0 with Squeeze-and-Excitation Attention (EffB0-SE)
|
- Require:
Fingerprint images X, binary labels C - Ensure:
Trained classifier M - 1:
Define backbone model: EffNetB0 pre-trained on ImageNet with no top layers - 2:
Initialize the full model: - 3:
Freeze all convolutional layers in EfficientNetB0 - 4:
for each image do - 5:
Resize x to , apply - 6:
Extract features using EfficientNetB0 backbone: - 7:
Apply Global Average Pooling: - 8:
Apply Squeeze-and-Excitation block: - 9:
- 10:
- 11:
▹ Attention-weighted feature - 12:
end for - 13:
Define classification head: - 14:
- 15:
- 16:
Output layer: - 17:
Compile model M with Binary Cross-Entropy loss and Adam optimizer - 18:
Train model on with 80:20 split for N epochs - 19:
Evaluate using Accuracy, Precision, Recall, F1-score, and ROC AUC return Hypothesis:
|
3.4. Fully Connected Layers and Classification
Subsequent to attention-based recalibration, the enhanced feature map is subjected to global average pooling to diminish its spatial dimensions:
This feature vector
is passed through a series of fully connected layers. The first layer implements a ReLU activation:
This is succeeded by another ReLU-activated dense layer:
To enhance generalization and avoid overfitting, batch normalization and dropout (with a rate of 0.4) are employed:
The final classification is performed employing a single sigmoid neuron:
where
is the predicted probability of the fingerprint being spoofed. The detailed model architecture summary is given in
Table 3, and the training configuration summary is given in
Table 4.
3.5. Loss Function and Optimization
The model is trained by Binary Cross-Entropy loss defined as follows:
where
is the ground truth label,
is the predicted probability, and
N is the number of samples.
The Adam optimizer is used for optimization, which adapts learning rates for diverse parameters dynamically. The learning rate is set to the following:
Figure 4 presents an overview of the proposed fingerprint liveness detection methodology, integrating an EfficientNet-inspired structure with Squeeze-and-Excitation (SE) modules and Mobile Inverted Bottleneck Convolution (MBConv) blocks for effective and discriminative feature extraction. The pipeline initiates with a fingerprint image of size
, which passes through preliminary convolutional layers that integrate batch normalization and Swish activation to augment feature expressiveness.
The image subsequently spans a series of MBConv blocks, each containing depthwise separable convolutions to reduce computational complicatedness, along with a Squeeze-and-Excitation block that captures inter-channel reliances. The SE block uses Global Average Pooling to lower spatial dimensions, followed by two pointwise convolutions activated by ReLU and sigmoid functions to generate channel attention weights. These weights are later used to the feature maps utilizing channel-wise multiplication, thereby intensifying principal features and attenuating noise.
The second MBConv block features an inverted residual connection, facilitating gradient flow and enabling feature reuse. The collected and weighted feature maps are subsequently refined and processed through a deep classification network comprising fully connected layers, batch normalization, and dropout regularization. The ultimate sigmoid-activated output categorizes the fingerprint as either genuine (live) or manipulated (spoof), showcasing a resilient end-to-end solution for fingerprint spoof detection.
The proposed TL-Efficient-SE framework diverges from conventional CNN designs by incorporating EfficientNetB0 alongside a Squeeze-and-Excitation attention mechanism. In contrast to conventional CNNs that depend exclusively on convolutional layers, the SE mechanism adaptively recalibrates channel-wise feature responses, enabling the model to concentrate on more salient information. This modification augments discriminative capability, enhances generalization across various sensors and spoof materials, and bolsters robustness, rendering the model more adept for practical fingerprint liveness detection applications.