Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature

Al Amin, Md; Reza, Naim; Jung, Ho Yub

doi:10.3390/electronics14091823

Open AccessArticle

Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature

by

Md Al Amin

,

Naim Reza

and

Ho Yub Jung

^*

Department of Computer Engineering, Chosun University, Gwangju 61452, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1823; https://doi.org/10.3390/electronics14091823

Submission received: 30 March 2025 / Revised: 24 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Real-Time Audio, Video and Image Processing: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

The spread of biometric systems utilizing fingerprints has increased the need for advanced spoof detection techniques, but training convolutional neural networks (CNNs) with the limited number of images available in fingerprint datasets poses significant challenges. In this paper, we propose a lightweight network architecture which addresses the challenges inherent in small fingerprint datasets by employing a moderately deep network architecture which is sufficient for extracting essential features from fingerprint images. We apply a hyperbolic tangent activation to the final feature map, which has features from local receptive fields, and average the responses into a single value. Thus, our architecture reduces overfitting by increasing the number of effective labels during training. Additionally, the incorporation of the spatial attention module enhances feature representation, culminating in improved accuracy. The evaluation results show that the proposed model, with only 0.14 million parameters, outperforms existing techniques including lightweight models and transfer-learning-based models, achieving superior average test accuracies of 98.30% and 95.57% on the LivDet-2015 and -2017 datasets, respectively. It also delivers state-of-the-art cross-material performance, with corresponding average classification error values of 0.81% and 1.91%, making it highly effective for on-device fingerprint authentication.

Keywords:

spoof fingerprint detection; receptive field-wise feature; lightweight network architecture; attention mechanism; LivDet-2015; LivDet-2017

1. Introduction

The increasing reliance on biometric authentication systems, particularly those utilizing fingerprints [1,2], underscores the critical need for advanced spoof detection and prevention mechanisms. This necessity arises from the fact that commonly available materials such as gelatin, silicone, and Play-Doh can be utilized to create spoof fingerprints capable of bypassing fingerprint recognition systems [3,4], with a reported success rate exceeding

70 %

[5]. To counteract such vulnerabilities, a variety of anti-spoofing techniques have been proposed, which can be broadly classified into hardware-based and software-based solutions [6,7]. Although hardware-based solutions are highly effective [8], they are also prohibitively expensive due to the need for multiple sensors. In contrast, software-based solutions offer a cost-effective alternative, achieving comparable efficacy by extracting features from fingerprint images captured by sensors, thereby distinguishing between live and spoof fingerprints without incurring additional hardware costs.

The application of CNNs has recently gained prominence in addressing the problem of spoof fingerprint detection. CNN-based approaches have consistently produced top-performing algorithms in various competitions, such as the Liveness Detection (LivDet) competition [9], over the past several years. To enhance accuracy and prevent overfitting on small datasets, pre-trained models are commonly used [10,11]. Additionally, patch extraction is often employed as a strategy to expand the training dataset, mitigating overfitting by increasing the number of training samples [12]. Furthermore, to improve model generalization, generative adversarial networks (GANs) have been used to generate synthetic fingerprint data, enriching the diversity of training samples [10,13].

CNN-based methods identify local patterns and structures in fingerprint images using convolutional filters [14]. However, one of their inherent limitations is the lack of spatial awareness, meaning they do not explicitly consider how these local features relate to each other across the entire image; this can hinder accuracy in fingerprint liveness detection, particularly in cases where the relationship between different fingerprint features plays a crucial role [15,16]. To overcome this challenge, attention-based approaches have been introduced, enabling models to capture and utilize spatial dependencies more effectively. By incorporating attention mechanisms, these methods enhance feature representation and improve overall detection performance. Attention-based methods have demonstrated high accuracy in fingerprint liveness detection by learning the global context of fingerprint images and directing focus toward important regions [17,18,19]. However, most existing approaches require significant computational resources or have long processing times due to their large number of parameters or intricate preprocessing steps. For example, some methods [15,20] rely on complex preprocessing techniques, such as patch extraction, minutiae detection, and synthetic fingerprint generation, which add computational overhead. While these methods perform well in controlled environments with high-end hardware, their reliance on large-scale models makes them impractical for resource-constrained devices [16,21,22]. Fingerprint liveness detection plays an essential role in on-device authentication, where both accuracy and speed are essential for a seamless user experience [19,23]. This creates a need for optimized fingerprint liveness detection techniques that maintain a balance between model accuracy and number of parameters. While some lightweight models with fewer parameters have made advancements in minimizing computational demands [13,19], further research is needed to provide lightweight models while enhancing accuracy and generalization.

In this work, we propose a lightweight model with a parameter count of only 0.14 million (M) for spoof fingerprint detection on LivDet-2015 [24] and LivDet-2017 [25] by guiding attention to essential features across different levels of detail. In our work, we did not use pre-trained models, trained on natural images, rather we provided an architecture tailored specifically for fingerprint data. In our proposed model, we avoid a fully connected prediction layer, as such a layer with our architecture may lead to overfitting on these small fingerprint datasets. In contrast, in our proposed model, the final feature map is averaged to a single value for classification. This architectural choice highlights the importance of spatial discriminability and motivates the integration of a spatial attention module [26] to guide the network’s focus toward the most informative regions. A spatial attention module integrated before the final convolutional layers ensures the preservation of critical spatial relationships, preventing the loss of intricate dependencies essential for effective spoof detection. Unlike conventional CNN architectures that assign uniform importance to all features, this attention module dynamically emphasizes spatial regions based on their discriminative relevance. Specifically, it directs the model’s focus towards informative regions containing high-quality ridge structures and subtle textural cues, significantly improving the model’s ability to differentiate between live and spoof fingerprints, even under challenging imaging conditions or when presented with previously unseen spoof materials. This targeted attention helps the network ignore irrelevant regions, improving generalization by guiding learning toward consistently discriminative spatial features. The initial convolutional layer employs a large

7 \times 7

kernel to capture broader contextual information, ensuring that each pixel in the feature map corresponds to a larger receptive field. To enhance robustness, the hyperbolic tangent activation applied to the final feature map prevents any single feature from dominating. The averaging operation leverages information from all regions of the fingerprint image, integrating subtle spatial patterns to improve classification accuracy. Furthermore, error calculation on the averaged CNN response effectively incorporates receptive field-wise labels, allowing the model to be trained with fewer parameters, thereby helping to prevent overfitting while maximizing accuracy.

This task-specific approach effectively mitigates the shortcomings commonly associated with excessive parameter counts and model complexity, which often compromise efficiency in resource-constrained deployments. This compactness makes our model exceptionally suitable for real-time and resource-constrained deployments. In contrast, existing methods such as FLDNet [13] and LFLDNet [19] utilize approximately 0.48 M and 0.83 M parameters, respectively, while even more resource demanding architectures like Slim-ResCNN [17] and ALDRN [27] employ over 2 million parameters. The substantial computational overhead of these methods significantly limits their practical deployment, especially in environments with strict computational constraints. Our model achieves superior average classification accuracy compared to many existing methods, such as [10,11], which rely on pre-trained models. Specifically, in challenging cross-material evaluations, our model significantly outperforms state-of-the-art methods such as Fingerprint Spoof Buster (FSB) [28], LivDet15 winner [29], CFD-PAD [30], RTK-PAD [30], and Siamese Attention Res-CNN (SA-R-CNN) [31], achieving ACE values of 0.81% on LivDet-2015 and 1.91% on LivDet-2017. Thus, our results demonstrate both superior accuracy and enhanced generalization capability while having significantly fewer parameters, highlighting our architecture’s efficiency for resource-constrained devices. The primary two contributions of this paper can be summarized as follows:

Proposing a lightweight fingerprint liveness detection approach that guides attention to essential features across multiple levels of detail, reducing model complexity while maintaining high accuracy.
Validating the generalization capability of the proposed method through comprehensive testing to ensure robustness.

2. Related Work

In software-based liveness detection, the distinction between live and spoof fingerprints depends heavily on effective feature extraction. The texture properties of live and spoof fingerprints differ significantly, particularly in terms of continuity and sharpness. These variations result from the inherent material and structural differences between authentic skin and synthetic materials, which influence the ridge patterns and fine details of fingerprints. As a result, texture-based feature extraction techniques have become widely used in static feature-based approaches. These techniques also highlight the use of traditional machine learning methods to enhance classification performance.

2.1. Static Feature-Based Approaches

Sharma et al. [32] introduced a texture-based fingerprint liveness detection method using handcrafted features like ridge-width smoothness, valley-width smoothness, and ridge–valley clarity to capture differences between live and spoof fingerprints. Feature selection techniques are employed to optimize extracted features, which are classified using traditional machine learning models such as SVM and Random Forest. The local phase quantization descriptor, which utilizes the short-time Fourier transform, was introduced as a method to capture the information loss that occurs during the fabrication process of spoof fingerprints, enabling the distinction between live and spoof fingerprints [33]. Gragnaniello et al. [34] explored Weber local descriptors to mitigate spoofing attacks on fingerprint sensors, leveraging their ability to extract local texture information to improve detection accuracy. Building on this, Gragnaniello et al. [35] developed the local contrastive phase descriptor, which enhances spoof detection by integrating gradient information with local phase data, achieving commendable accuracy. Xia et al. [36] proposed the Weber local binary descriptor, which introduced a novel approach to analyzing texture features and explored the benefits of feature fusion techniques, combining multiple features to improve performance and robustness in fingerprint liveness detection.

Although these methods deliver satisfactory classification accuracy, their generalization and robustness remain limited, especially when dealing with images captured in complex or challenging conditions, such as those affected by excessive noise, variations in sensor quality, or disruptions caused by inconsistent pressure during fingerprint acquisition. To overcome these limitations, recent research has increasingly focused on deep learning techniques, with CNNs such as DenseNet, MobileNet, ResNet, and VGG. These methods have achieved significant attention as they eliminate the need for extensive manual feature design, enabling models to autonomously learn patterns and features directly from the data while improving classification accuracy.

2.2. CNN-Based Approaches

Building upon this, and shifting away from manually extracted features to learning-based approaches, recent studies have proposed various CNN-based architectures for fingerprint liveness detection. Nogueira et al. [37] introduced one of the earliest CNN-based approaches for fingerprint liveness detection by leveraging a pre-trained VGG network. Their method achieved state-of-the-art performance and was recognized as the winning solution in the 2015 Fingerprint Liveness Detection Competition, marking a significant shift toward deep learning in the field. Rai et al. [10] proposed a presentation attack detection method based on an Open Patch Generator (OPG) that uses GANs to create synthetic spoof fingerprint patches. These synthetic patches are used to augment the training dataset, while DenseNet classifiers are trained to distinguish between live and spoof fingerprints, showing strong performance in generalization across datasets. Similarly, the MoSFPAD model by Rai et al. [11] combines MobileNet for feature extraction with an SVM to deliver a hybrid architecture. This model focuses on patch-based feature extraction for detecting spoof attacks. To enhance efficiency, Slim-ResCNN [17] takes an approach of using improved residual blocks with dropout layers to prevent overfitting. Sharma et al. [38] introduced the HyFiPAD model, which combines multiple texture analysis methods, including local adaptive binary pattern, complete local adaptive binary pattern, and binarized statistical image features. By using a combination of SVM and CNNs with majority voting, this method improves classification accuracy, particularly in identifying texture details in fingerprint images. The Adaptive-Learning-Based Deep Residual Network (ALDRN) proposed by Yuan et al. [27] addresses the challenge of fingerprint liveness detection by utilizing region-of-interest (ROI) extraction to remove noise and isolate key minutiae. Additionally, the use of deep residual networks mitigates degradation issues through shortcut connections.

CNN-based methods lack spatial awareness, as they do not explicitly capture relationships between local features across the entire image. This limitation can affect fingerprint liveness detection, particularly when the spatial dependencies between fingerprint features are crucial. To address this, attention-based approaches have been introduced, allowing models to effectively learn and utilize these dependencies [17].

2.3. Attention-Based Approaches

In an effort to enhance feature discrimination and reduce noise in fingerprint spoof detection, various attention-based methods have been incorporated, including Convolutional Block Attention Module (CBAM)-based [39,40], multi-head attention-based [19], and self-attention-based approaches [41]. To enhance feature discrimination in fingerprint spoof detection, Aluvala et al. [40] proposed a CNN-based architecture augmented with a CBAM. Their work leverages both channel and spatial attention mechanisms to enable the model to adaptively emphasize the most informative fingerprint regions while suppressing irrelevant patterns. To strengthen its attention capacity, the architecture applies CBAM at multiple layers, using local binary pattern descriptors as input features. The channel attention mechanism focuses on inter-channel relationships, while the spatial attention module learns spatially salient regions. This dual-attention design improves the model’s ability to highlight fabricated texture patterns, an important factor in spoof detection.

On the other hand, Grosz et al. [41] proposed a unified Vision Transformer (ViT)-based architecture that simultaneously performs fingerprint presentation attack detection and identity recognition. Unlike conventional CNN-based approaches that rely heavily on patch-level analysis or separate models for PAD and recognition, their method leverages the global and local attention mechanisms intrinsic to ViT. The model captures global contextual representations for fingerprint recognition and local patch-based attention features for spoof detection. This dual use of self-attention allows the architecture to efficiently extract discriminative features across spatial hierarchies. While promising, ViT demands significant computational resources and requires large-scale datasets for training, posing challenges for practical implementation in real-time fingerprint authentication systems that prioritize efficiency and speed [42].

In contrast, our architecture incorporates a spatial attention module to enhance discriminative feature learning. Rather than relying on fully connected layers, we employ global average pooling to aggregate the final feature map into a single scalar output, allowing each spatial location to contribute equally to the classification. The spatial attention module further guides the model to focus on the most informative regions, capturing pixel-level significance.

2.4. Lightweight Methods

As fingerprint liveness detection plays a critical role in on-device authentication, where both accuracy and speed are essential for a seamless user experience [19,23], several recent studies have proposed lightweight architectures aimed at reducing model complexity while maintaining competitive accuracy in spoof detection. One lightweight approach is Slim-ResCNN, proposed by Zhang et al. [17], which introduces a residual convolutional neural network architecture specifically optimized for fingerprint liveness detection. Unlike traditional deep models adapted from natural image processing, Slim-ResCNN is tailored to the unique properties of fingerprint images, incorporating nine improved residual blocks with design simplifications to reduce parameter count and training complexity. They use dropout-enhanced residual connections to mitigate overfitting. Additionally, the model adopts a patch-based training strategy that leverages local fingerprint regions extracted via center-of-gravity-based segmentation, thereby reducing execution time and enabling better adaptability to varying fingerprint sample sizes. Slim-ResCNN with 2.5 M parameters achieves competitive performance, securing the top position in the LivDet 2017 competition.

To further provide a lightweight network, Zhang et al. [13] proposed FLDNet that comprises only 0.48 million parameters, significantly reducing the computational and storage burden compared to conventional CNNs, while maintaining classification performance across various spoofing scenarios. Additionally, FLDNet incorporates a building block design, combining dense connectivity with residual paths. This dual-path structure enables both feature reuse and the continuous discovery of new discriminative features, effectively balancing performance and efficiency. The architecture also benefits from ROI extraction and multi-patch sampling during training, which enhances robustness without incurring significant overhead. FLDNet offers a fully end-to-end solution that processes fingerprint patches with minimal architectural complexity. Its efficiency, accuracy, and task-specific design highlight FLDNet as a strong benchmark in the lightweight fingerprint spoof detection literature.

On the other hand, LFLDNet by Kang et al. [19] introduces an architecture for fingerprint liveness detection by integrating a modified ResNet backbone with a Transformer-based attention mechanism. To reduce complexity, the authors design a shallow residual network with fewer convolutional kernels and employ dropout layers after each convolution to prevent overfitting on small grayscale fingerprint datasets. A notable contribution is the inclusion of multi-head self-attention in place of the final convolution block, allowing the model to capture long-range dependencies and global ridge structures. To further enhance generalization, especially under unknown material attacks, the authors incorporate synthetic spoof fingerprints using CycleGAN-based style transfer augmentation. Despite these additions, the model remains efficient, with only 0.83 million parameters.

Although these models are lightweight, with fewer parameters compared to conventional CNN-based methods, there is still a need for more efficient architectures, maximizing the classification accuracy, that are suitable for resource-constrained devices. Our proposed model consists of only 0.14 million parameters, which is significantly lower than those of the aforementioned lightweight models, while achieving higher classification accuracy. The model operates on full-fingerprint images and produces a single scalar value through global average pooling. This strategy increases the number of effective labels and helps reduce overfitting. We deliberately avoided using any fully connected layers. To further improve generalization, we integrated a spatial attention module, which added only a negligible number of additional parameters.

According to our observation, GANs for style transfer, as employed in [10,13], have become a common approach to enhance generalization. However, many of these methods rely on CNN models pre-trained on natural images rather than architectures specifically designed for fingerprint data. This oversight hampers accuracy improvement. In contrast, our proposed model is designed to reduce computational overhead by minimizing the parameter count while improving both accuracy and generalization. The experimental results demonstrate that our architecture achieves superior performance with high average accuracy, while also delivering state-of-the-art cross-material ACE, confirming the proposed method’s suitability as a practical solution.

3. Method

The proposed methodology, shown in Figure 1, leverages a moderately deep CNN architecture comprising six convolutional layers. These layers are optimized to ensure comprehensive feature extraction while mitigating the risk of overfitting. The layer-wise details of the proposed CNN architecture are presented in Table 1, and the detailed methodology is outlined below.

We denote the feature extraction module as

f_{e} (\cdot)

, consisting of four convolutional layers, and the final convolutional module as

f_{c} (\cdot)

, which includes two convolutional layers. The input image X is processed by the feature extraction module, which is expressed as

F = f_{e} (X),

(1)

where F represents the extracted feature map. In this module, the first convolutional layer employs a large

7 \times 7

kernel with stride

2 \times 2

, capturing broader contextual information from the input image X. This ensures that each pixel in the final feature map corresponds to a large receptive field. The initial feature map is then reduced in dimensions using a

2 \times 2

max-pooling layer. Subsequent layers (Conv2d-2 to Conv2d-4) utilize smaller

3 \times 3

kernels to extract hierarchical features produced by the initial layer. Each layer is followed by a max-pooling operation to progressively downsample the feature maps while retaining essential spatial information.

The extracted feature map is then processed through the spatial attention module [26], introduced before the final convolutional module to enhance the model’s ability to distinguish spoofed and live fingerprints. This spatial attention module computes a spatial attention map, which highlights critical regions for classification and improves feature representation. In this module, the attention map is first generated by applying average pooling and max pooling along the channel axis, resulting in two spatial descriptors: an average-pooled descriptor

F_{a v g} \in R^{1 \times H \times W}

and a max-pooled descriptor

F_{m a x} \in R^{1 \times H \times W}

. These descriptors are concatenated and passed through a

7 \times 7

convolution, followed by a sigmoid activation, yielding the spatial attention map. Once the attention map is obtained, it is element-wise-multiplied with the original feature map F within this module, producing the refined feature map:

F_{refined} = σ (f^{7 \times 7} ([AvgPool (F); MaxPool (F)])) ⊙ F

(2)

The refined feature map

F_{refined}

is next directed to the final convolutional module

f_{c} (\cdot)

, where the final convolutional layer in this module uses a

3 \times 3

kernel, producing a single-channel feature map

F^{'}

. A hyperbolic tangent activation is subsequently applied to this feature map, bounding the output feature values between

[- 1, 1]

, normalizing the response, and reducing the dominance of any single feature.

F_{tanh}^{'} = tanh (f_{c} (F_{refined})),

(3)

where

F_{tanh}^{'}

is the feature map obtained by applying the hyperbolic tangent activation to the convolutional features. This normalized feature map

F_{tanh}^{'}

is later averaged across all pixels to generate the final output

y^{'}

, which is expressed as

y^{'} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F_{tanh}^{'} (i, j),

(4)

where the output

y^{'}

represents the liveness score for classification. The model is trained using the binary cross-entropy loss:

L = - \frac{1}{N} \sum_{j = 1}^{N} [y_{j} log (y_{j}^{'}) + (1 - y_{j}) log (1 - y_{j}^{'})],

(5)

where y represents the ground truth label,

y^{'}

denotes the predicted probability, and N is the number of samples.

3.1. Datasets and Preprocessing

We evaluated our approach using the LivDet-2015 [24] and LivDet-2017 [25] fingerprint datasets. The training set for the LivDet-2015 dataset contained approximately 1000 to 1500 live and spoofed fingerprint images per sensor, including CrossMatch, DigitalPersona, GreenBit, and HiScan. For the LivDet-2017 dataset, the training set contained around 1000 to 1200 live and spoofed fingerprint images per sensor, including DigitalPersona, GreenBit, and Orcanthus. In the test data for LivDet-2015, each sensor contained between 1050 and 1500 spoof images and between 1000 and 1500 live images, along with 500 unknown material images. For LivDet-2017, the test data included approximately 2000 spoof images and 1700 live images per sensor, all of which were from unknown materials. To ensure consistency across different sensors and resolutions, all fingerprint images were first converted to grayscale and inverted so that fingerprint ridges appeared bright. Next, we performed connected component analysis to extract the main fingerprint region by detecting the largest foreground area after binarization and morphological dilation. The extracted region was centrally placed on a fixed-size black background of

512 \times 512

pixels, ensuring a uniform input size. Images smaller than

512 \times 512

were padded with zeros, and larger ones were centrally cropped. This preprocessing standardizes spatial scale and alignment across all inputs. In addition, we applied data augmentation during training. Specifically, we used random horizontal and vertical flipping as well as random rotations in the range of

- 30^{\circ}

to

+ 30^{\circ}

.

3.2. Training Protocol

The model was trained using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01, no momentum, and no weight decay, using a batch size of 2. The training process was conducted over 1500 epochs. A large number of epochs was necessary because the number of training samples was small. However, due to the proposed architecture, overfitting was not an issue. The proposed approach helps prevent overfitting by increasing the number of effective labels. And our proposed model only consists of 140,353 parameters, maintaining computational efficiency while achieving improved accuracy. The details of the network architecture are presented in Table 1. The detailed results on the LivDet-2015 and LivDet-2017 datasets demonstrate the efficacy of this approach.

4. Results

In this study, we employed multiple evaluation metrics to comprehensively assess model performance, including classification accuracy, average classification error (ACE), precision, recall, and F1-score. These metrics are particularly critical in fingerprint liveness detection, where misclassifying spoof images as live can compromise system security.

True positive (TP): The case where a spoof fingerprint is correctly classified as a spoof.
False positive (FP): The case where a live fingerprint is incorrectly classified as a spoof.
True negative (TN): The case where a live fingerprint is correctly classified as live.
False negative (FN): The case where a spoof fingerprint is incorrectly classified as live.

Classification accuracy: Accuracy measures the overall effectiveness of the model by calculating the ratio of correctly classified samples to the total number of samples.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(6)

ACE: The ACE is commonly used in fingerprint liveness detection tasks and quantifies the overall classification error. The ACE is calculated as the mean of the error rates for live samples and spoof samples.

ACE = \frac{F_{errlive} + F_{errfake}}{2}

(7)

Here,

F_{errlive}

is the proportion of live fingerprints misclassified as spoofs, and

F_{errfake}

is the proportion of spoof fingerprints misclassified as live.

Precision: Precision indicates the proportion of predicted positive samples that are correctly classified. It reflects how accurate the model is when it predicts a sample as positive.

Precision = \frac{T P}{T P + F P}

(8)

Recall: Recall measures the proportion of actual positive samples that are correctly identified by the model. In the context of spoof detection, it shows the effectiveness of detecting actual spoofs.

Recall = \frac{T P}{T P + F N}

(9)

F1-score: F1-score is the harmonic mean of precision and recall, balancing both metrics, especially when the dataset is imbalanced. It provides a single measure of a test’s accuracy.

F 1 - Score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(10)

Classification performance comparison: The evaluation was conducted on the test set for each sensor in the LivDet-2015 and LivDet-2017 datasets. Table 2 shows that on the LivDet-2015 dataset, our proposed model, despite its simplicity, achieved an impressive average accuracy of 98.30%, surpassing several existing methods. The model particularly excelled on sensors such as GreenBit, CrossMatch, and HiScan, with accuracies of 99.31%, 99.79%, and 98.04%, respectively, illustrating its capability in capturing and utilizing relevant fingerprint features. However, on the DigitalPersona sensor, the model’s accuracy was slightly lower at 96.07%. This small drop in performance can be attributed to the smaller image size, as the lower resolution of DigitalPersona images made it more challenging to distinguish fine fingerprint details. The results on the LivDet-2017 dataset are presented in Table 3, which shows that the model achieved comparable accuracies of 94.89%, 96.06%, and 95.77% across the three sensors, including DigitalPersona, GreenBit, and Orcanthus. In terms of average accuracy, the proposed model achieved 95.57% accuracy, exceeding existing methods.

Cross-material generalization performance: Table 4 and Table 5 provide a comprehensive comparison of the proposed model’s cross-material robustness against several state-of-the-art methods on the LivDet-2015 and LivDet-2017 datasets. These evaluations were conducted under the challenging condition where spoof materials used for testing were excluded from the training phase, thereby assessing the model’s ability to generalize to unseen attacks.

On the LivDet-2015 dataset, the proposed model achieved an average ACE of 0.81 percent, outperforming LivDet15-Winner [29] at 5.72 percent, FSB [28] at 1.48 percent, and SA-R-CNN [31] at 5.17 percent. Across individual sensors such as GreenBit and CrossMatch, our method achieved remarkably low ACE values of 0.20 percent and 0.16 percent, respectively, indicating strong generalization across different sensor types and spoof materials.

Likewise, on the LivDet-2017 dataset, our model continued to exhibit robust generalization performance, achieving an average ACE of 1.91 percent. This outperformed all compared approaches including RTK-PAD [47] at 2.28 percent, CFD-PAD [47] at 2.53 percent, FSB [28] at 4.56 percent, and SA-R-CNN [31] at 3.12 percent. Across all sensors, the proposed model consistently achieved the lowest or highly competitive ACE values, underscoring its strong capability in detecting spoof attacks fabricated from previously unseen materials.

Comprehensive metric evaluation: To comprehensively evaluate the classification performance of the proposed model, we analyze precision, recall, and F1-score across different sensors in the LivDet-2015 and LivDet-2017 datasets. As shown in Table 6, the model achieves consistently high performance on the LivDet-2015 dataset, with an average precision, recall, and F1-score of 0.979, 0.976, and 0.977, respectively. Notably, on sensors such as CrossMatch and HiScan, the model reaches F1-scores of 0.998 and 0.974, indicating its strong capability to balance between sensitivity and specificity. Even on the more challenging DigitalPersona sensor, the model maintains a robust F1-score of 0.948, demonstrating reliable spoof detection under lower-resolution input conditions. Similarly, on the LivDet-2017 dataset, as shown in Table 7 the proposed model continues to exhibit strong generalization, achieving an average precision of 0.954, recall of 0.949, and F1-score of 0.952 across all sensors. These high values reflect the model’s effectiveness in correctly identifying both live and spoof fingerprints while minimizing misclassification. The consistently high F1-scores across diverse sensors further substantiate the robustness and reliability of the proposed lightweight architecture for fingerprint liveness detection.

Parameter efficiency: The comparison shown in Table 8 highlights the efficiency of the proposed model on the LivDet-2015 dataset. With only 0.14 M parameters, it achieves the highest accuracy of 98.30%, significantly outperforming other methods such as Slim-ResCNN (2.15 M parameters), LFLDNet (0.83M parameters), and FLDNet (0.48 M parameters). This demonstrates that the proposed model produces better results with 70% fewer parameters compared to the FLDNet lightweight model.

4.1. Visual Analysis

To improve interpretability and provide insight into the model’s decision-making process, we visualize the liveness maps generated from the final feature map prior to global average pooling, as shown in Figure 2. These liveness maps are derived from the output of the final convolutional layer and are globally averaged for classification. The original resolution of each map is 24 × 24, which we resized to 512 × 512 for visualization purposes. In these maps, the background is represented by a value of 0. For live fingerprints, the foreground regions exhibit positive values, whereas for spoof fingerprints, the corresponding values tend to be negative. We applied normalization to scale all values between 0 and 1 to enhance visual contrast. As illustrated, live fingerprints generate liveness maps with high values, appearing brighter, while fake fingerprints yield lower values, resulting in darker regions. This clearly reflects how the model differentiates between live and spoof fingerprints and confirms that the liveness map carries meaningful and class-discriminative spatial patterns, validating our design choice.

4.2. Ablation Study

The ablation study in Table 9 evaluates the impact of key architectural components, the spatial attention module and hyperbolic tangent activation, on the performance of our proposed model using the DigitalPersona of LivDet-2015 and LivDet-2017 datasets. Importantly, the removal of the dense layer and the inclusion of global average pooling result in a significant accuracy improvement from 89.48% to 95.20% on the LivDet-2015 dataset and from 90.81% to 94.11% on the LivDet-2017 dataset, compared to the base CNN with a fully connected dense layer, despite having a lower parameter count. These improvements demonstrate that our lightweight architecture reduces overfitting while maintaining strong classification performance. In our study, we consider our proposed model without the spatial attention module and hyperbolic tangent activation as the base CNN. Introducing the spatial attention module before the last two convolutional layers of our architecture improves performance by enabling the model to focus on discriminative regions, enhancing feature extraction. The spatial attention module alone adds only a negligible increase of fewer than 100 parameters, yet guides the network to focus on informative spatial regions. While its standalone use achieves lower accuracy than the baseline model on LivDet-2015, its integration with global pooling and tanh activation leads to performance improvement. This shows that the spatial attention module is effective when combined with other components in the architecture. Furthermore, instead of using a fully connected layer for classification, we average the feature map to obtain a single value, allowing the model to evaluate receptive field-wise features equally. The hyperbolic tangent activation enhances robustness by preventing any single feature from dominating. The full architecture, incorporating both enhancements, achieves the highest accuracy, with 96.08 percent on LivDet-2015 and 94.89 percent on LivDet-2017, respectively, demonstrating the efficiency of our lightweight architecture in improving spoof fingerprint detection while maintaining a low parameter count.

Furthermore, to examine the impact of spatial attention placement within the network, we conducted an ablation study that is summarized in Table 10. Three configurations were tested: placing the spatial attention module after the final convolution layer; before the final convolution layer; and in the middle of the architecture, as proposed. The results demonstrate that integrating the attention module in the middle yields the highest classification accuracy of 96.07%, outperforming other placements. This suggests that introducing spatial attention earlier allows the network to better refine intermediate feature representations, enhancing its discriminative capacity.

Moreover, to evaluate the effectiveness of our attention strategy, we conducted a comparative study by integrating different attention modules into the same backbone architecture and testing them on the DigitalPersona sensor from the LivDet-2015 dataset. As shown in Table 11, our proposed method with spatial attention module achieves the highest classification accuracy of 96.07%, outperforming both the Convolutional Block Attention Module (CBAM) at 94.28% and the Squeeze-and-Excitation (SE) block at 92.40%. Even without any attention mechanism, the model achieved 95.20% accuracy. These results highlight the superiority of our proposed architecture with this attention design in enhancing discriminative feature learning specifically tailored for fingerprint spoof detection.

5. Model Complexity Overview

The proposed model demonstrates strong deployment efficiency, achieving an average inference time of 3.157 milliseconds per image when evaluated on a system equipped with an NVIDIA GeForce RTX 3070 GPU, a 24-core CPU, and 64 GB of RAM. This measurement was conducted using a batch size of 1 to reflect real-time operational conditions. In addition to its fast inference speed, the model exhibits low computational complexity, with 1.6855 GFLOPs, consists of only 0.14 million parameters, and occupies a compact size of 0.54 MB. These characteristics highlight the model’s suitability for deployment in resource-constrained environments and on-device fingerprint authentication systems. The detailed computational complexity and size of the model are summarized in Table 12.

6. Conclusions

This paper introduces a lightweight CNN-based architecture for fingerprint spoof detection, addressing the challenges posed by small fingerprint datasets. In this work, the use of local receptive fields with a spatial attention module in conjunction with the hyperbolic tangent activation, followed by averaging into a single value for loss calculation and classification, helps prevent overfitting and enhances feature representation, ultimately improving classification performance. Unlike traditional methods reliant on transfer learning or complex preprocessing, this architecture achieves simplicity and computational efficiency, with only 0.14 M parameters, achieving better accuracy compared to previous works including recent lightweight models and transfer-learning-based models. The experimental results demonstrate that the model delivers superior accuracy and outperforms a state-of-the-art technique in cross-material performance, achieving significant reductions in ACE. These results validate the effectiveness of the proposed architecture in fingerprint spoof detection.

Author Contributions

Conceptualization, H.Y.J.; Methodology, H.Y.J.; Software, M.A.A. and N.R.; Validation, N.R.; Formal analysis, M.A.A.; Investigation, N.R.; Writing—original draft, M.A.A.; Writing—review & editing, H.Y.J.; Supervision, N.R. and H.Y.J.; Project administration, H.Y.J.; Funding acquisition, H.Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by research fund from Chosun University, 2023.

Data Availability Statement

The datasets used in this study are publicly available and were employed for the analyses reported in this article. These datasets can be accessed at https://www.livdet.org/registration.php accessed on 17 June 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Avanzato, R.; Beritelli, F.; Serrano, S. Robust Biometric Verification Using Phonocardiogram Fingerprinting and a Multilayer-Perceptron-Based Classifier. Electronics 2024, 13, 4377. [Google Scholar] [CrossRef]
Wu, B.; Zhang, S.; Gao, W.; Bi, Y.; Hu, X. A Method for Fingerprint Edge Enhancement Based on Radial Hilbert Transform. Electronics 2024, 13, 3886. [Google Scholar] [CrossRef]
Jung, H.; Heo, Y. Fingerprint liveness map construction using convolutional neural network. Electron. Lett. 2018, 54, 564–566. [Google Scholar] [CrossRef]
Lee, Y.K.; Jeong, J.; Kang, D. An effective orchestration for fingerprint presentation attack detection. Electronics 2022, 11, 2515. [Google Scholar] [CrossRef]
Gupta, A.; Verma, A. Fingerprint presentation attack detection approaches in open-set and closed-set scenario. J. Physics Conf. Ser. 2021, 1964, 042050. [Google Scholar] [CrossRef]
Habib, A.; Selwal, A. Robust anti-spoofing techniques for fingerprint liveness detection: A Survey. Iop Conf. Ser. Mater. Sci. Eng. 2021, 1033, 012026. [Google Scholar] [CrossRef]
Ming, Z.; Visani, M.; Luqman, M.M.; Burie, J.C. A survey on anti-spoofing methods for facial recognition with rgb cameras of generic consumer devices. J. Imaging 2020, 6, 139. [Google Scholar] [CrossRef]
Xi, Z.; Zhang, G.; Zhang, B.; Zhang, T. Device identity recognition based on an adaptive environment for intrinsic security fingerprints. Electronics 2024, 13, 656. [Google Scholar] [CrossRef]
LivDet—Fingerprint Liveness Detection Competitions. Available online: https://www.livdet.org/competitions.php (accessed on 19 September 2024).
Rai, A.; Anshul, A.; Jha, A.; Jain, P.; Sharma, R.P.; Dey, S. An open patch generator based fingerprint presentation attack detection using generative adversarial network. Multimed. Tools Appl. 2024, 83, 27723–27746. [Google Scholar] [CrossRef]
Rai, A.; Dey, S.; Patidar, P.; Rai, P. Mosfpad: An end-to-end ensemble of mobilenet and support vector classifier for fingerprint presentation attack detection. Comput. Secur. 2024, 148, 104069. [Google Scholar] [CrossRef]
Chugh, T.; Cao, K.; Jain, A.K. Fingerprint spoof buster: Use of minutiae-centered patches. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2190–2202. [Google Scholar] [CrossRef]
Zhang, Y.; Pan, S.; Zhan, X.; Li, Z.; Gao, M.; Gao, C. Fldnet: Light dense cnn for fingerprint liveness detection. IEEE Access 2020, 8, 84141–84152. [Google Scholar] [CrossRef]
Alimovski, E.; Rasheed, J. Automated Biometrical Fingerprint Recognition Scheme using Synthesized Images. In Proceedings of the 1st International Conference on Computing and Machine Intelligence, Online, 19–20 February 2021; İstanbul Sabahattin Zaim Üniversitesi: Istanbul, Turkey, 2021. [Google Scholar]
Heo, Y. Generative Adversarial Network Based Fingerprint Anti-Spoofing Limitations. Int. J. Comput. Inf. Eng. 2021, 15, 349–353. [Google Scholar]
Yook, H.J.; Hong, P.M.; Kang, S.H.; San Jhun, G.; Seo, J.E.; Lee, Y.K. Attention Map Is All We Need for Lightweight Fingerprint Liveness Detection. IEEE Access 2024, 12, 130031–130041. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, D.; Zhan, X.; Cao, D.; Zhu, K.; Li, Z. Slim-ResCNN: A deep residual convolutional neural network for fingerprint liveness detection. IEEE Access 2019, 7, 91476–91487. [Google Scholar] [CrossRef]
Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef]
Zhang, K.; Huang, S.; Liu, E.; Zhao, H. LFLDNet: Lightweight fingerprint liveness detection based on ResNet and transformer. Sensors 2023, 23, 6854. [Google Scholar] [CrossRef]
Park, E.; Kim, W.; Li, Q.; Kim, J.; Kim, H. Fingerprint liveness detection using CNN features of random sample patches. In Proceedings of the 2016 International Conference of the Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany, 21–23 September 2016; IEEE: New York, NY, USA, 2016; pp. 1–4. [Google Scholar]
Purnapatra, S.; Rezaie, H.; Jawade, B.; Liu, Y.; Pan, Y.; Brosell, L.; Sumi, M.R.; Igene, L.; Dimarco, A.; Setlur, S.; et al. Liveness Detection Competition-Noncontact-based Fingerprint Algorithms and Systems (LivDet-2023 Noncontact Fingerprint). In Proceedings of the 2023 IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, 25–28 September 2023; IEEE: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
Yuan, C.; Cui, B.; Fu, Z.; Zhou, Z.; Liu, Y.; Yang, Y.; Wu, Q.J. RTRGAN: Ridge Texture Rendering Based Generative Adversarial Fingerprint Attack for Portable Consumer Electronics Devices. IEEE Trans. Consum. Electron. 2024, 70, 5471–5482. [Google Scholar] [CrossRef]
Yang, W.; Wang, S.; Sahri, N.M.; Karie, N.M.; Ahmed, M.; Valli, C. Biometrics for internet-of-things security: A review. Sensors 2021, 21, 6163. [Google Scholar] [CrossRef]
Ghiani, L.; Yambay, D.A.; Mura, V.; Marcialis, G.L.; Roli, F.; Schuckers, S.A. Review of the fingerprint liveness detection (LivDet) competition series: 2009 to 2015. Image Vis. Comput. 2017, 58, 110–128. [Google Scholar] [CrossRef]
Mura, V.; Orrù, G.; Casula, R.; Sibiriu, A.; Loi, G.; Tuveri, P.; Ghiani, L.; Marcialis, G.L. LivDet 2017 fingerprint liveness detection competition 2017. In Proceedings of the 2018 International Conference on Biometrics (ICB), Redondo Beach, CA, USA, 22–25 October 2018; IEEE: New York, NY, USA, 2018; pp. 297–302. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Yuan, C.; Xia, Z.; Sun, X.; Wu, Q.J. Deep residual network with adaptive learning framework for fingerprint liveness detection. IEEE Trans. Cogn. Dev. Syst. 2019, 12, 461–473. [Google Scholar] [CrossRef]
Chugh, T.; Cao, K.; Jain, A.K. Fingerprint Spoof Buster. arXiv 2017, arXiv:1712.04489. [Google Scholar]
Mura, V.; Ghiani, L.; Marcialis, G.L.; Roli, F.; Yambay, D.A.; Schuckers, S.A. LivDet 2015 fingerprint liveness detection competition 2015. In Proceedings of the 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS), Arlington, VA, USA, 8–11 September 2015; pp. 1–6. [Google Scholar] [CrossRef]
Liu, H.; Zhang, W.; Liu, F.; Wu, H.; Shen, L. Fingerprint presentation attack detector using global-local model. IEEE Trans. Cybern. 2021, 52, 12315–12328. [Google Scholar] [CrossRef] [PubMed]
Yuan, C.; Xu, Z.; Li, X.; Zhou, Z.; Huang, J.; Guo, P. An Interpretable Siamese Attention Res-CNN for Fingerprint Spoofing Detection. IET Biom. 2024, 2024, 6630173. [Google Scholar] [CrossRef]
Sharma, R.P.; Dey, S. Fingerprint liveness detection using local quality features. Vis. Comput. 2019, 35, 1393–1410. [Google Scholar] [CrossRef]
Ghiani, L.; Marcialis, G.L.; Roli, F. Fingerprint liveness detection by local phase quantization. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; IEEE: New York, NY, USA, 2012; pp. 537–540. [Google Scholar]
Gragnaniello, D.; Poggi, G.; Sansone, C.; Verdoliva, L. Fingerprint liveness detection based on weber local image descriptor. In Proceedings of the 2013 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications, Napoli, Italy, 9 September 2013; IEEE: New York, NY, USA, 2013; pp. 46–50. [Google Scholar]
Gragnaniello, D.; Poggi, G.; Sansone, C.; Verdoliva, L. Local contrast phase descriptor for fingerprint liveness detection. Pattern Recognit. 2015, 48, 1050–1058. [Google Scholar] [CrossRef]
Xia, Z.; Yuan, C.; Lv, R.; Sun, X.; Xiong, N.N.; Shi, Y.Q. A novel weber local binary descriptor for fingerprint liveness detection. IEEE Trans. Syst. Man, Cybern. Syst. 2018, 50, 1526–1536. [Google Scholar] [CrossRef]
Nogueira, R.F.; de Alencar Lotufo, R.; Machado, R.C. Fingerprint liveness detection using convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1206–1213. [Google Scholar] [CrossRef]
Sharma, D.; Selwal, A. HyFiPAD: A hybrid approach for fingerprint presentation attack detection using local and adaptive image features. The Visual Computer 2022, 38, 2999–3025. [Google Scholar] [CrossRef]
Kothadiya, D.; Bhatt, C.; Soni, D.; Gadhe, K.; Patel, S.; Bruno, A.; Mazzeo, P.L. Enhancing fingerprint liveness detection accuracy using deep learning: A comprehensive study and novel approach. J. Imaging 2023, 9, 158. [Google Scholar] [CrossRef]
Almusawi, M.; Aluvala, S.; Trisheela, S.; Soni, M.; R, R. Convolutional Neural Network with Convolutional Block Attention Mechanism for Fingerprint Forgery Detection. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; IEEE: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
Grosz, S.A.; Wijewardena, K.P.; Jain, A.K. ViT unified: Joint fingerprint recognition and presentation attack detection. In Proceedings of the 2023 IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, 25–28 September 2023; IEEE: New York, NY, USA, 2023; pp. 1–9. [Google Scholar]
Li, Y.; Lu, H.; Wang, Y.; Gao, R.; Zhao, C. ViT-Cap: A novel vision transformer-based capsule network model for finger vein recognition. Appl. Sci. 2022, 12, 10364. [Google Scholar] [CrossRef]
M. Jomaa, R.; Mathkour, H.; Bazi, Y.; Islam, M.S. End-to-end deep learning fusion of fingerprint and electrocardiogram signals for presentation attack detection. Sensors 2020, 20, 2085. [Google Scholar] [CrossRef] [PubMed]
Uliyan, D.M.; Sadeghi, S.; Jalab, H.A. Anti-spoofing method for fingerprint recognition using patch based deep learning machine. Eng. Sci. Technol. Int. J. 2020, 23, 264–273. [Google Scholar] [CrossRef]
Sittirit, N.; Mongkolwat, P.; Thaipisutikul, T.; Supratak, A.; Chen, Z.S.; Wang, J.C. Fingerprint Liveness Detection with Voting Ensemble Classifier. In Proceedings of the 2022 6th International Conference on Information Technology (InCIT), Nonthaburi, Thailand, 10–11 November 2022; IEEE: New York, NY, USA, 2022; pp. 105–110. [Google Scholar]
Reza, N.; Jung, H.Y. Enhancing Ensemble Learning Using Explainable CNN for Spoof Fingerprints. Sensors 2023, 24, 187. [Google Scholar] [CrossRef]
Liu, F.; Kong, Z.; Liu, H.; Zhang, W.; Shen, L. Fingerprint presentation attack detection by channel-wise feature denoising. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2963–2976. [Google Scholar] [CrossRef]

Figure 1. Proposed lightweight CNN architecture for spoof fingerprint detection, employing convolutional layers for feature extraction and a spatial attention module to focus on crucial regions. The design incorporates hyperbolic tangent activation with global mean pooling to enhance performance and minimize overfitting on small fingerprint datasets.

Figure 2. Examples of liveness maps generated by the proposed model for live and fake fingerprints, shown alongside their corresponding input images.

Table 1. Layer-wise configuration and parameters of the proposed model.

Layer	Stride	Output Shape	Kernel	Parameters
Input	-	$1 \times 512 \times 512$	-	0
Conv2d-1	$2 \times 2$	$128 \times 253 \times 253$	$7 \times 7$	6400
MaxPool2d-1	$2 \times 2$	$128 \times 126 \times 126$	$2 \times 2$	0
ReLU-1	-	$128 \times 126 \times 126$	-	0
Conv2d-2	$1 \times 1$	$64 \times 124 \times 124$	$3 \times 3$	73,792
MaxPool2d-2	$2 \times 2$	$64 \times 62 \times 62$	$2 \times 2$	0
ReLU-2	-	$64 \times 62 \times 62$	-	0
Conv2d-3	$1 \times 1$	$64 \times 60 \times 60$	$3 \times 3$	36,928
MaxPool2d-3	$2 \times 2$	$64 \times 30 \times 30$	$2 \times 2$	0
ReLU-3	-	$64 \times 30 \times 30$	-	0
Conv2d-4	$1 \times 1$	$32 \times 28 \times 28$	$3 \times 3$	18,464
Conv-1 in Spatial Attention Module	$1 \times 1$	$1 \times 28 \times 28$	$7 \times 7$	98
Conv2d-5	$1 \times 1$	$16 \times 26 \times 26$	$3 \times 3$	4624
ReLU-4	-	$16 \times 26 \times 26$	-	0
Conv2d-6	$1 \times 1$	$1 \times 24 \times 24$	$3 \times 3$	145
TanH	-	$1 \times 24 \times 24$	-	0
Output (mean)	-	1	-	0
Total number of parameters:				140,353

Table 2. Comparison of the proposed model with previous methods on the LivDet-2015 dataset in terms of classification accuracy (%). Bold values indicate the best performance in each column.

Model	Year	DigitalPersona	GreenBit	CrossMatch	HiScan	Average Accuracy
CNN-VGG [37]	2016	93.72	95.40	98.10	94.36	95.39
FSB [28]	2017	97.04	98.64	98.72	97.76	98.04
ALDRN [27]	2019	93.20	95.23	96.54	93.76	94.68
Jomma et al. [43]	2020	91.96	94.68	97.29	95.12	94.87
Uliyan et al. [44]	2020	-	-	95.00	-	95.00
HyFiPAD [38]	2022	97.2	96.7	95.00	95.5	96.10
VootingEnsemble [45]	2022	96.6	99.2	96.44	98.8	97.76
LFLDNet [19]	2023	93.52	98.56	98.18	96.00	96.56
SDCNN [46]	2023	94.24	98.88	99.19	97.92	97.56
OPG-CNN [10]	2024	93.50	97.63	97.51	96.18	96.20
MoSFPAD [11]	2024	95.35	98.18	98.39	97.19	97.23
CNN-CBAM [16]	2024	96.44	98.40	99.33	-	98.05
Proposed model		96.07	99.31	99.79	98.04	98.30

Table 3. Comparison of the proposed model with previous methods on the LivDet-2017 dataset in terms of classification accuracy (%). Bold values indicate the best performance in each column.

Model	Year	DigitalPersona	GreenBit	Orcanthus	Average Accuracy
SlimResCNN [17]	2019	95.59	96.44	93.71	95.25
SDCNN [46]	2023	94.7	96.17	94.19	95.02
OPG-CNN [10]	2024	93.64	94.74	96.53	94.97
MoSFPAD [11]	2024	95.40	93.79	95.94	95.20
Proposed model		94.89	96.06	95.77	95.57

Table 4. Cross-material robustness comparison in terms of ACE for the proposed method on the LivDet-2015 dataset.

Dataset	Sensor	Training Materials	Testing Materials	LivDet15-Winner [29]	FSB [28]	SA-R-CNN [31]	Proposed Model
LivDet-2015	DigitalPersona	Ecoflex, Gelatin, Latex, Wood Glue	Liquid Ecoflex, RTV	6.00%	1.80%	5.46%	1.40%
	GreenBit	Ecoflex, Gelatin, Latex, Wood Glue	Liquid Ecoflex, RTV	7.40%	1.80%	4.84%	0.20%
	CrossMatch	Body Double, Ecoflex, Play-Doh	Gelatin, OOMOO	4.02%	0.00%	4.07%	0.16%
	HiScan	Ecoflex, Gelatin, Latex, Wood Glue	Liquid Ecoflex, RTV	5.80%	2.60%	6.03%	1.50%
	Average			5.72%	1.48%	5.17%	0.81%

Table 5. Cross-material robustness comparison in terms of ACE for the proposed method on the LivDet-2017 dataset.

Dataset	Sensor	Training Materials	Testing Materials	RTK-PAD [30]	CFD-PAD [47]	FSB [28]	SA-R-CNN [31]	Proposed Model
LivDet-2017	DigitalPersona	Wood Glue, Ecoflex, Body Double	Gelatin, Latex, Liquid Ecoflex	3.25%	3.33%	4.88%	4.12%	1.40%
	GreenBit	Wood Glue, Ecoflex, Body Double	Gelatin, Latex, Liquid Ecoflex	1.92%	2.59%	3.32%	2.83%	1.49%
	Orcanthus	Wood Glue, Ecoflex, Body Double	Gelatin, Latex, Liquid Ecoflex	1.67%	1.68%	5.49%	2.43%	2.53%
	Average			2.28%	2.53%	4.56%	3.12%	1.91%

Table 6. Performance metrics on the LivDet-2015 dataset.

Sensor	Accuracy (%)	Precision	Recall	F1-Score
DigitalPersona	96.07	0.957	0.940	0.948
GreenBit	99.31	0.990	0.991	0.990
HiScan	98.04	0.974	0.974	0.974
CrossMatch	99.79	0.996	0.999	0.998
Average	98.30	0.979	0.976	0.977

Table 7. Performance metrics on the LivDet-2017 dataset.

Sensor	Accuracy (%)	Precision	Recall	F1-Score
DigitalPersona	94.89	0.946	0.942	0.944
GreenBit	96.06	0.963	0.951	0.957
Orcanthus	95.77	0.955	0.954	0.955
Average	95.57	0.954	0.949	0.952

Table 8. Comparison of parameters and accuracy on the LivDet-2015 dataset.

	Slim-ResCNN [17]	LFLDNet [19]	FLDNet [13]	Proposed Model
Parameters (M)	2.15	0.83	0.48	0.14
Accuracy (%)	96.82	96.56	96.48	98.30

Table 9. Impact of components and parameter count on performance in terms of accuracy (%).

Components				Parameter	DigitalPersona, 2015 (%)	DigitalPersona, 2017 (%)
Base CNN	Spatial Attention	Tanh Activation	Dense Layer	Count	Accuracy	Accuracy
✓	×	×	✓	141,605	89.48	90.81
✓	×	×	×	140,353	93.19	91.85
✓	✓	×	×	140,451	91.92	89.86
✓	×	✓	×	140,353	95.20	94.11
✓	✓	✓	×	140,451	96.07	94.89

Table 10. Impact of spatial attention module placement on classification accuracy. The highlighted row indicates the proposed method, which achieved the highest accuracy.

Placement of Spatial Attention Module	Classification Accuracy (%)
After final convolution layer	94.84
Before final convolution layer	95.52
Middle of architecture (proposed)	96.07

Table 11. Comparison of different attention modules integrated into the same backbone architecture in terms of classification accuracy on the DigitalPersona sensor in LivDet-2015 dataset.

Attention Module	Classification Accuracy (%)
No attention module	95.20
SE Block	92.40
CBAM	94.28
Spatial attention module	96.07

Table 12. Computational complexity and size of the proposed model.

Metric	Value
FLOPs	1.6855 GFLOPs
Number of parameters	0.14 million
Model size	0.54 MB

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al Amin, M.; Reza, N.; Jung, H.Y. Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature. Electronics 2025, 14, 1823. https://doi.org/10.3390/electronics14091823

AMA Style

Al Amin M, Reza N, Jung HY. Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature. Electronics. 2025; 14(9):1823. https://doi.org/10.3390/electronics14091823

Chicago/Turabian Style

Al Amin, Md, Naim Reza, and Ho Yub Jung. 2025. "Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature" Electronics 14, no. 9: 1823. https://doi.org/10.3390/electronics14091823

APA Style

Al Amin, M., Reza, N., & Jung, H. Y. (2025). Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature. Electronics, 14(9), 1823. https://doi.org/10.3390/electronics14091823

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Network for Spoof Fingerprint Detection by Attention-Aggregated Receptive Field-Wise Feature

Abstract

1. Introduction

2. Related Work

2.1. Static Feature-Based Approaches

2.2. CNN-Based Approaches

2.3. Attention-Based Approaches

2.4. Lightweight Methods

3. Method

3.1. Datasets and Preprocessing

3.2. Training Protocol

4. Results

4.1. Visual Analysis

4.2. Ablation Study

5. Model Complexity Overview

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI