1. Introduction
Biometric recognition, the automated identification of individuals based on physiological or behavioral traits, has become a pervasive front end for access control in consumer, industrial, and governmental systems, including mobile devices, smart locks, surveillance cameras, and border-control infrastructure. At the same time, biometric sensing and decision making are increasingly pushed toward the edge, where strict constraints on latency, energy, and connectivity coexist with strong privacy requirements. These constraints make it difficult to rely solely on CPU-centric embedded platforms or cloud offloading for always-on, real-time authentication.
Field-programmable gate arrays (FPGAs) offer an attractive substrate for edge biometrics because they combine reconfigurability with spatial parallelism and deterministic, low-latency execution. FPGA fabrics can be tailored into application-specific streaming pipelines that tightly control arithmetic precision and data movement, enabling efficient implementations of both classical signal/image-processing stages and quantized deep neural networks. Recent advances in FPGA-SoC devices and toolchains have further reduced the barrier to deploying end-to-end biometric pipelines on reconfigurable hardware, with competitive accuracy under tight power budgets [
1,
2,
3,
4]. By executing inference locally, FPGA-based systems can also reduce biometric data exposure and dependency on network connectivity [
1,
5].
Despite rapid progress, FPGA-based biometric implementations are reported across disparate modalities, algorithms, and platform-specific design choices, making it difficult to extract reusable engineering insights or to compare design trade-offs across papers. This review consolidates recent work with an implementation-centric perspective that spans algorithms, architectures, and security. Specifically, the paper: (i) surveys FPGA and FPGA-SoC designs from 2021–2025 for face, fingerprint, iris, speaker (voiceprint), and finger vein recognition; (ii) contrasts classical pipelines with deep learning deployments and summarizes the evaluation metrics most frequently disclosed (accuracy/EER, latency/throughput, resource utilization, and power); (iii) distills recurring architectural patterns and toolflows (e.g., streaming/dataflow design, parallel processing elements, quantization, buffering/tiling, and the use of HLS or vendor IP); and (iv) discusses security and resilience aspects, including template protection and presentation attack detection (PAD) [
6].
Figure 1 provides a high-level reference pipeline used throughout this review, from sensor acquisition and preprocessing to feature extraction, matching, and decision output, emphasizing secure handling of biometric templates.
The remainder of this paper is organized as follows.
Section 2 reviews related work and existing surveys in FPGA-based biometrics.
Section 3 discusses the role and advantages of FPGAs in biometric systems.
Section 4 provides a modality-wise review: face recognition (
Section 4.1), fingerprint (
Section 4.2), iris (
Section 4.3), speaker/voice (
Section 4.4), and finger vein (
Section 4.5), summarizing recent implementations and results.
Section 5 discusses common architectures and design patterns across these works.
Section 6 addresses security, anti-spoofing, and privacy considerations.
Section 7 outlines challenges and limitations.
Section 8 looks ahead to future research directions.
Section 9 concludes the paper.
2. State-of-the-Art
2.1. Literature Selection
This manuscript is a narrative review; it does not adhere to the PRISMA standard for systematic reviews. We adopt a narrative format because our objective is to synthesize architectural patterns and deployment trade-offs across a heterogeneous body of FPGA biometric implementations, an objective for which qualitative cross-cutting interpretation is more useful than the strict reproducibility a systematic review would offer. Throughout this section, we make the search strategy and selection logic explicit so readers can evaluate the scope and the residual selection bias of the review.
Research questions. The review is organized around four research questions: (RQ1) What FPGA and FPGA-SoC implementations have been published for face, fingerprint, iris, speaker/voiceprint, and finger vein recognition in 2021–2025? (RQ2) Which architectural patterns recur across modalities and what trade-offs do they imply for accuracy, latency, resource utilization, and power? (RQ3) What security and presentation-attack-detection (PAD) capabilities have been hardware-realized on FPGA, and where does an open gap remain? (RQ4) When is an FPGA the right platform versus a competing edge-AI SoC or NPU?
Databases and search. We searched IEEE Xplore, Scopus, ACM Digital Library, MDPI, SpringerLink, ScienceDirect, and Google Scholar between June 2025 and April 2026. Search keywords combined a hardware term (“FPGA,” “FPGA-SoC,” “Zynq,” “Versal,” “PolarFire,” “Cyclone,” “Agilex”) with a biometric term (“biometric recognition,” “face recognition,” “fingerprint,” “iris recognition,” “speaker recognition,” “voiceprint,” “finger vein,” “presentation attack detection,” “liveness”) joined by Boolean AND, with optional refinements for “deep learning,” “CNN,” “SNN,” “transformer,” “quantization,” and “HLS.”
Time range. The 2021–2025 window was chosen because Xilinx Zynq UltraScale+ and Microchip PolarFire SoC reached commercial maturity in 2020–2021, making 2021 the first year in which heterogeneous CPU–FPGA biometric pipelines became broadly comparable across vendors. We treated 2025 as the upper bound to ensure all included works are peer-reviewed and indexed at the time of writing.
Inclusion and exclusion criteria. Inclusion required (i) a peer-reviewed venue; (ii) a documented FPGA target device or FPGA-SoC platform with reported resource utilization, latency/throughput, accuracy, or power; and (iii) a primary contribution that is biometric-recognition relevant. We excluded purely algorithmic studies without hardware mapping, except where they served as foundational architectural references explicitly cited as historical context (see
Section 4.1). Pre-2021 works are retained only when they are the canonical source for an architectural pattern still in use; the bibliography flags such inclusions in the surrounding text.
Screening and deduplication. Records returned by each database were deduplicated by DOI; where two databases returned the same item under different metadata (e.g., a preprint and a proceedings version), we retained the version with the most complete reported metrics. Title and abstract screening removed items that mentioned FPGA only in passing or that mentioned biometrics only as motivating examples without a hardware contribution. Full-text review then applied the inclusion criteria above. Disagreements between the two screening authors were resolved by consensus; we did not record formal inter-rater agreement statistics because the review is narrative.
Final corpus. After screening, 90 references entered the manuscript bibliography. These cover the FPGA/platform-oriented and biometric-oriented categories described in
Section 3, the per-modality reviews of
Section 4, the cross-modality architectural-pattern synthesis of
Section 5, the security and PAD discussion of
Section 6, and the foundational pre-2021 references retained as architectural or historical context.
Selection principles for representative papers. Within
Section 4 and the modality-wise summary table presented in
Section 4.5, “representative” is not a synonym for “best”; it identifies works selected to span the design space we wish to explain. We applied four principles, in order: (i) coverage of all five modalities, with at least two FPGA-implemented entries per modality where the literature supports it, and an explicit gap statement where it does not (see
Section 4.2,
Section 4.3 and
Section 4.5); (ii) coverage of the architectural-pattern axes (classical pipeline, dense CNN, separable-convolution CNN, spiking/event-driven model, transformer-based model, multimodal fusion), so that each recurring pattern is concretely instantiated by at least one cited work; (iii) coverage of the deployment regimes the review discusses (sub-watt edge, single-watt embedded, mid-watt SoC-FPGA, GPU-comparison-class), keyed to the platform column of the per-modality summary table presented in
Section 4.5; and (iv) inclusion of seminal pre-2021 architectural references and 2025 industry demonstrations only when they are necessary to anchor a claim about pattern continuity (e.g., FINN [
7] for streaming binarized inference) or about the current commercial state of practice (e.g., Microchip PolarFire SoC face recognition without a heatsink [
5]). Where multiple works satisfy a principle, preference was given to the one with the most complete reported metrics.
2.2. Categorization of Prior Work
Prior work related to FPGA-based biometric recognition generally addresses narrow aspects of the field. These studies can be broadly categorized into two main directions: biometric-oriented research and FPGA/platform-oriented research.
Most biometric-focused studies investigate novel recognition methods or algorithmic techniques for specific modalities and analyze their performance characteristics, often without considering hardware constraints or practical FPGA implementation issues. Nevertheless, these works provide valuable algorithmic foundations that can later be adapted for FPGA realization. For example, the studies [
8,
9] investigate fusion techniques in biometric systems, while [
10] explores both classical approaches and neural-network-based methods. Several studies, including [
11,
12,
13,
14,
15,
16], apply neural network techniques to biometric recognition, and [
17] enhances DL-based feature extraction, whereas [
18] focuses on evaluation tools and benchmarking approaches for biometric systems. Two further studies, although algorithm-driven in framing, present concrete FPGA-SoC implementations and are therefore discussed in the FPGA/platform-oriented category below: ref. [
19] proposes a GMM-based speaker-verification SoC with hardware MFCC, and ref. [
20] presents a dynamically reconfigurable LDPC decoder embedded in an iris-recognition SoC pipeline.
In addition, a number of review papers concentrate on specific biometric modalities or thematic aspects rather than FPGA implementation. For instance, study [
21] discusses privacy issues in biometric systems; ref. [
22] reviews methods for speaker recognition; studies [
23,
24] survey deep learning models for biometric recognition; and ref. [
25] provides a comprehensive survey on finger vein recognition. Privacy preservation in IoT-oriented biometric applications is reviewed in [
26], while speech recognition systems are surveyed in [
27,
28]. Furthermore, study [
29] presents a comprehensive review of iris recognition techniques.
On the FPGA/platform-oriented side, biometric-related studies can generally be grouped into several categories.
First, several FPGA implementation studies focus on a single recognition method for a specific biometric modality, such as CNN-, SNN-, or attention-based, or classical feature-extraction approaches deployed on FPGA platforms. Examples cover face, fingerprint, finger vein, iris, and speech recognition systems and include [
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42]; we additionally retain [
43] (2010), the earliest FPGA-based speech-recognition system in this category, as historical context, since it predates and motivates the 2021–2025 voiceprint implementations reviewed in
Section 4.4. These studies primarily demonstrate feasibility and performance gains of FPGA realization for a particular modality but rarely provide a broader comparative perspective across modalities or techniques.
Second, several studies emphasize
hardware acceleration and FPGA architectural optimization. These works investigate pipeline design, parallel processing, memory optimization, and specialized accelerators for neural network or image-processing workloads. Representative examples include neural network accelerators [
44,
45], surveys on real-time face recognition accelerators [
46], acceleration of deep-learning-based biometric recognition [
1], binary CNN implementations [
47], general FPGA-based NN acceleration concepts [
48], image-processing acceleration [
34], cloud FPGA acceleration platforms [
49], hardware architectures for face detection [
45,
50], and real-time feature extraction and template matching designs [
40].
Finally, several works focus on security- or implementation-specific aspects of FPGA-based biometric systems, including template protection, encryption, anti-spoofing techniques, and specialized model implementations. For example, study [
51] reviews cryptographic and post-quantum security techniques, ref. [
52] surveys neural network implementation on FPGA and embedded platforms, refs. [
53,
54] address SNN implementations, ref. [
55] presents a low-latency on-device acoustic CNN model on FPGA, ref. [
56] proposes AES-based encryption for secure biometric video streams, and ref. [
57] introduces an FPGA-based face anti-spoofing approach.
Although these studies provide valuable insights, they generally address algorithmic, hardware, or security aspects in isolation. In contrast, our review offers a comprehensive overview of recent FPGA-based biometric recognition systems, considering both their advantages and limitations relative to other computing platforms. Specifically, it covers: (1) recent recognition methods across multiple biometric modalities, highlighting developments aligned with FPGA implementation trends; (2) hardware acceleration techniques, including FPGA architectures, resource utilization, and design optimization strategies for biometric applications; and (3) security considerations such as template protection, anti-spoofing, privacy preservation, and secure biometric deployment on FPGA platforms.
The closest contemporary survey to ours is [
32], which also reviews FPGA-based biometric recognition. Our work differs in three concrete ways. The first is scope: ref. [
32] groups results by hardware-architecture family (e.g., classical pipelines, CNN accelerators, SNN accelerators), whereas this review is organized by biometric modality and explicitly includes finger vein implementations and speaker/voiceprint systems, which [
32] treats only briefly. The second is evaluation: we report quantitative comparison tables in
Section 4 of latency, accuracy, and power per work, while flagging measurement protocol heterogeneity (%, EER, AP, WER) as a finding rather than abstracting it away. The third is security: we treat presentation attack detection (PAD) and template protection as first-class deployment concerns alongside accuracy, including the explicit gap that few FPGA-realized PAD systems existed in 2021–2025. The catalogued works [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29] are individually narrow in scope, each restricted to one modality, one algorithm family, or one platform-level concern, and are therefore complementary to but not substitutes for the cross-modality FPGA-implementation perspective taken here.
3. Role of FPGA in Biometric Systems
FPGAs offer a compelling hardware platform for biometric recognition due to several key attributes: massive parallelism, reconfigurability, and low-power operation for custom logic. Unlike fixed-function ASICs or power-hungry GPUs, FPGAs can be reprogrammed to instantiate application-specific architectures that exploit the parallel nature of biometric algorithms. Many biometric tasks involve concurrent operations, for example, filtering an image, extracting features across multiple regions, or evaluating neurons in a neural network layer, which FPGAs handle well by utilizing numerous logic cells and DSP units in parallel. Seng et al. (2021) emphasize that FPGAs, with their rich fabric of logic and DSP blocks, can perform arithmetic-intensive operations with high throughput, making them suitable for embedded deep learning inference [
58]. As a result, in recent years, FPGAs have seen substantial growth in deployment for deep learning applications, including face and speech recognition, where they can meet real-time requirements at lower power budgets [
2].
Another advantage of FPGAs is their energy efficiency and lower latency for tailored tasks. By customizing the data paths and removing overhead, an FPGA design can execute a biometric inference pipeline with minimal wasted work, often leading to reduced latency compared to running the equivalent algorithm on a CPU/GPU. For example, AlBdairi et al. report a 13-layer face-recognition CNN running at 10.7 W on a DE1-SoC FPGA versus 650 W on the GPU baseline used for the same network, a roughly 60-fold reduction in instantaneous power for comparable accuracy [
2]; the Microchip PolarFire SoC FPGA executes an 8-bit quantized face-recognition model with no heatsink or fan, on the order of a few watts [
5]. These are like-for-like FPGA-versus-GPU/CPU comparisons reported by the cited authors, not isolated FPGA numbers. Moreover, FPGAs allow use of fixed-point arithmetic and optimized memory access patterns (e.g., streaming dataflow architectures), which further cut down power consumption without sacrificing performance [
2]. This is crucial for battery-powered devices like mobile biometric scanners or smart cameras. Many authors report that FPGA-based systems can process biometric data in real time (often at frame rates
fps for vision, or streaming audio in real time) with only a few watts of power, underscoring their suitability for always-on edge AI. For example, Microchip’s PolarFire SoC FPGA demonstrated in 2025 performs facial recognition inference with no heatsink or fan, illustrating the ultra-low-power operation possible with 8-bit quantized models running on an FPGA fabric with embedded RISC-V cores [
5].
Customizability is another pivotal feature of FPGAs. Biometric systems vary widely in their algorithms, from classical signal processing pipelines to deep CNNs, and FPGAs can morph to accelerate whichever algorithm is needed. Developers have leveraged high-level synthesis and vendor-provided deep learning accelerators (like Xilinx/AMD’s Deep Learning Processing Unit, DPU) to rapidly deploy complex networks on FPGAs [
59]. Unlike fixed silicon, FPGAs also enable updates: if a new network architecture or anti-spoofing method emerges, the device can be reconfigured with the updated design, extending the system’s life. This flexibility is important in biometrics, where security threats and AI models evolve quickly. Additionally, processing biometric data on-device (on an FPGA in a local system) inherently improves privacy and security. Sensitive data (fingerprints, face images, voice samples) need not be sent to a cloud server if the FPGA can handle the inference locally. This mitigates risks of data interception and also reduces latency from network communication [
1]. Many recent works explicitly highlight standalone FPGA systems operating without internet connectivity for applications like access control, ensuring that raw biometric data remains on the device [
1].
Several comparative analyses of hardware platforms for machine learning and signal processing applications have been reported in the literature [
2,
45,
51] (
Table 1).
4. Modality-Wise Review of FPGA-Based Biometrics
FPGA implementations have been explored for virtually every major biometric modality. Here, we review five modalities—face, fingerprint, iris, voice, and finger vein—highlighting representative approaches from 2021–2025. For each modality, we contrast classical pipeline implementations and newer deep learning approaches on FPGAs, and we note performance metrics when available (such as accuracy, throughput, and resource/power usage).
4.1. Face Recognition on FPGA
Face recognition is a mature yet evolving field where deep learning now dominates. Deploying face recognition on FPGAs has attracted intense research interest, as face recognition is often needed in embedded contexts (e.g., surveillance cameras, door entry systems) that demand real-time, local processing. Early FPGA implementations of face recognition used classical algorithms (e.g., PCA or LDA for face feature extraction), but recent efforts focus on CNN-based face embedding networks and classification models.
A comparative summary of the typical FPGA resource utilization and performance implications associated with major face recognition method categories is presented in
Table 2, stemming from the computational characteristics and dataflow requirements. Holistic statistical methods, such as PCA, ICA, and Eigenfaces, mainly involve linear projections and relatively simple probabilistic computations, resulting in modest arithmetic complexity and limited on-chip memory requirements. Consequently, these approaches typically exhibit low LUT, FF, DSP, and BRAM utilization, although their recognition accuracy is generally moderate.
Local feature-based approaches introduce additional computational overhead. Key-point descriptor techniques such as SIFT or SURF require convolution-like operations, gradient computations, and descriptor matching, which increase DSP utilization and intermediate buffering demands in BRAM. Appearance-based local descriptors such as LBP, HOG, and LPQ involve simpler pixel-level operations and histogram accumulation, making them comparatively efficient for FPGA implementation while maintaining reasonable robustness to illumination variations.
Hybrid machine learning approaches combine handcrafted features with classifiers such as SVM or shallow neural networks. These methods require additional multiply–accumulate operations and storage of classifier parameters, leading to moderate FPGA resource usage while typically improving recognition accuracy relative to purely statistical methods.
Deep-learning-based approaches, particularly convolutional neural networks, exhibit the highest computational and memory demands due to extensive multiply–accumulate operations, large parameter sets, and intermediate feature-map storage. This explains their significant consumption of DSP slices, flip-flops, LUTs, and BRAM/URAM resources, as well as comparatively higher power requirements. Nevertheless, FPGA-based parallelism, pipelined dataflow, and quantization techniques often allow latency to remain acceptable for real-time biometric applications.
To provide an indicative view of FPGA resource utilization and its dependence on architectural design, two recent face-recognition implementations can be compared directly. AlBdairi et al. [
2] deploy a 13-layer face CNN on a DE1-SoC board using a 16-bit fixed-point datapath with 64 parallel multiplier arrays, sized so that the CNN fits within the SoC’s logic and DSP budget while still matching the GPU accuracy baseline. Tsai et al. [
61] target a Cyclone V 5CSEBA6 device for an end-to-end 720p face access-control pipeline using depthwise/pointwise separable convolutions; the heavier separable-convolution accelerator drives substantially higher LUT, FF, and DSP utilization than the AlBdairi 16-bit MAC array, while reducing per-inference latency. The two designs illustrate how face-recognition FPGA resource utilization strongly depends on the choice of convolution primitive, arithmetic precision, and degree of unrolling, with corresponding trade-offs against latency and on-chip memory pressure.
Despite their higher hardware cost, modern face recognition systems increasingly favor deep learning approaches because of their superior accuracy; robustness to illumination, pose, occlusion, and aging variations; and their ability to learn discriminative representations directly from large datasets. Continuous advances in FPGA architectures, including increased DSP availability, larger on-chip memory capacities, and improved support for low-precision arithmetic, have further facilitated efficient deployment of deep neural networks. Consequently, while classical approaches remain attractive for low-power or resource-constrained scenarios, deep-learning-based methods have become the dominant choice in contemporary FPGA-based face recognition systems where recognition performance and adaptability are prioritized.
Several works have demonstrated that FPGAs can run heavy face CNNs with competitive accuracy and improved efficiency. AlBdairi et al. (2022) introduced a deep CNN model for face ethnicity identification and deployed it on an Intel/Altera SoC-FPGA (DE1-SoC board) [
2]. Their network contained 13 convolutional layers in hardware and was compared against GPU implementation. The FPGA-based model achieved 96.9% accuracy and a 94.6% F1-score on a 3141-image face dataset, essentially matching the GPU accuracy but with lower energy usage [
2]. By leveraging parallel fixed-point operations on the FPGA’s DSP arrays, they sped up inference while consuming less power, making it suitable for portable embedded systems [
2]. The authors noted that customizing the hardware for the CNN (e.g., using a 16-bit fixed-point representation and parallelizing 64 multiplies-computational arrays) was key to meeting throughput needs within the FPGA resource limits [
2]. This work underscores how an FPGA can handle a deep face recognition model with strong energy efficiency relative to a GPU baseline. We caution, however, that AlBdairi et al. report a 2.76 s end-to-end inference time on the DE1-SoC, which corresponds to roughly 0.36 inferences/s and is therefore not “real time” in the frame-rate sense (typically ≥30 fps for vision); the system is real time only relative to its 5.1 s GPU baseline. Throughout this review, we use “real-time” only when the cited work meets a frame-rate or sub-100-millisecond-per-inference threshold (e.g., ref. [
62] at 55.44 fps, ref. [
60] at 31 ms, ref. [
61] at 30.3 ms/ID), and avoid the term for designs whose latency exceeds that.
Another study by Zayed et al. (2025) explored multiple CNN architectures for face recognition on Xilinx Zynq FPGAs [
1]. They implemented popular networks (AlexNet, VGG-16, ResNet, GoogLeNet) and evaluated accuracy vs. resource trade-offs. GoogLeNet emerged as the “best fit” on FPGA, offering a balance of accuracy and lower computational cost [
1]. In their system, a Pynq-Z2 board (Zynq SoC) was used in a hardware/software co-design: the FPGA fabric accelerated the CNN inference, while the embedded ARM CPU handled control logic. This standalone FPGA-based system attained ∼85–87% face recognition accuracy (on a certain real-world dataset) without any cloud connection [
1]. For comparison, a pure software baseline on a Raspberry Pi 3B only reached ∼70–75% and required cloud assistance, highlighting that the FPGA not only improved speed but also enabled use of a more robust model locally [
1]. The design achieved real-time face authentication and was aimed at door access control, stressing security (no data leaves the device) as a benefit of the FPGA solution [
1]. These results demonstrate that even fairly deep networks (GoogLeNet has 22 layers) can be executed on mid-range FPGAs today, given careful optimization and quantization.
Recent industry demonstrations echo these academic findings. In mid-2025, Microchip Technology showed an ultra-efficient face ID running on their PolarFire SoC FPGA [
5]. Pre-trained face recognition models were quantized to 8-bit and executed on the FPGA’s RISC-V cores augmented by the VectorBlox accelerator library. The outcome was high-throughput inference with no heatsink or fan, a testament to the low power draw. The system boasted “best-in-class security” as well [
5], since the FPGA’s inherent bitstream security and the absence of an OS reduce attack surfaces.
It is worth noting that face recognition pipelines often include a face detection stage prior to identification. Some FPGA works specifically target accelerating face detection (e.g., using optimized MTCNN or YOLO on FPGA for finding faces in images) [
62], which complements the recognition stage. However, many integrated systems use FPGA for the entire flow: capturing camera data, detecting and cropping faces, extracting face feature embeddings via CNN, and matching or classifying the identity. For instance, Tsai et al. (2024) describe a hardware design for a complete face access control system using a DNN on FPGA for detection and recognition, achieving real-time performance for 720p video input [
61]. In such systems, the FPGA can be programmed with a multistage pipeline (detector followed by recognizer) and handle multiple faces per frame in parallel.
Figure 2 illustrates a typical FPGA-based face recognition system leveraging a CNN-based biometric recognition approach. The system acquires images from a camera sensor, which are then preprocessed through resizing, normalization, and quantization to optimize them for hardware execution. The preprocessed data is streamed into the FPGA, where an input buffer implemented in BRAM temporarily stores feature maps to ensure continuous dataflow. The core of the FPGA, denoted as CNN-ACC, executes multiple convolutional layers, activation functions, batch normalization, and fully connected layers sequentially, producing a compact feature embedding for each face. Weights for the CNN are loaded from external DDR memory and cached in on-chip BRAM or URAM to reduce latency and maximize throughput. Finally, the feature embeddings are compared with stored templates in the matching stage to identify the input face. This architecture exploits both intra-layer parallelism (multiple MAC units per convolution) and inter-layer pipelining, enabling high throughput while minimizing latency and power consumption. FPGA CNN accelerators typically combine three parallelism forms: spatial parallelism (multiple PEs computing different output pixels), channel parallelism (concurrent processing of feature-map channels), and pipeline parallelism (convolution, activation, and pooling chained as a streaming pipeline). Several recent studies have proposed FPGA CNN accelerators with application-specific architectures, optimization approaches, and target use cases; representative examples are provided in [
63,
64,
65,
66]. We include [
63] (2020) and [
64] (2020) as foundational architectural references that established the streaming CNN-on-Zynq pattern still used by 2021–2025 face recognition work; the remaining entries fall within the survey window. Foundational pre-2021 references are retained only when they are the canonical source for an architectural pattern or result we cite, not as primary evidence for current performance claims.
Overall, the face recognition modality showcases how FPGAs can meet the dual demands of accuracy and efficiency. With network compression techniques (quantization, pruning) and hardware-friendly model architectures (e.g., using depthwise separable convolutions [
2] or binarized networks), FPGA implementations have achieved accuracies in the mid-90% range on challenging face datasets [
2], while operating in real time within a few watts of power.
4.2. Fingerprint Recognition on FPGA
Fingerprint recognition systems entail tasks such as image enhancement, feature extraction (minutiae or texture features), and matching against a database. FPGAs have long been used in fingerprint biometrics, initially to accelerate filtering and minutiae detection operations. In the 2021–2025 period, we see both continued use of FPGAs for classic fingerprint algorithms and new approaches using deep learning to classify or match fingerprints.
One example of a classical approach is the work by Harikrishnan et al. [
67], who implemented a fast and secure fingerprint authentication on FPGA using a true random number generator for enhancing security. Their design introduces a true random and timestamp generator (TRSG) into the authentication pipeline to fortify it against replay attacks and for template protection. The FPGA prototype achieved notable resource efficiency: the authors reported that LUT utilization was reduced to only 0.17% (of the FPGA capacity) and the system complexity to 14%, compared to a prior baseline, thanks to their streamlined design [
67]. This suggests the FPGA was vastly underutilized, implying that even low-cost FPGAs have sufficient capacity for basic fingerprint matching logic. The focus of that work was on security (cancellability) and speed rather than deep learning; it likely implemented fingerprint feature extraction and matching in custom logic, demonstrating an extremely lightweight implementation suitable for smartcards or portable devices.
Deep learning has also entered the fingerprint domain for classification (by pattern type or identity) and liveness detection (real vs. fake fingerprints). Shafaghi et al. [
3] proposed a fast and light CNN-based fingerprint matching model amenable to FPGA deployment. Their CNN architecture achieved over 94% accuracy on multiple fingerprint databases, while using 75% fewer parameters and memory than prior deep networks, making it hardware-friendly [
3]. By minimizing the number of layers and optimizing the kernel sizes, they reduced both memory footprint and computational cost. The network was tailored so that it could be implemented on an FPGA with at least a 10% speed improvement over state-of-the-art software methods [
3]. Although the reference focuses on the algorithm, the implication is that an FPGA can exploit the reduced model size to perform fingerprint identification quickly, even for large-scale databases. For historical context (the work predates the 2021–2025 survey window), an earlier large-scale experiment showed an FPGA could match up to 2.75 million fingerprints per second by parallelizing the comparison operations [
68]; we cite this only as a benchmark for the parallel-matching capability of FPGAs in identification systems and not as a current-period result.
A typical contemporary system uses the FPGA for feature extraction (classical thinning and minutiae detection, or CNN features) and sometimes for matching, with a CPU handling database search. Sensor-integrated designs place a small FPGA on the scanner module to compute the template on-chip, keeping the raw image off the bus and improving privacy. The privacy aspect is further strengthened by techniques like template encryption and cancellable biometrics on FPGAs [
6], which we discuss later.
Fingerprint PAD has also been explored on FPGAs. Liveness detection commonly uses texture analysis or CNNs; Pallakonda et al. (2023) introduced an EfficientNet+SE attention model [
17]. Although the work is algorithmic, the model is small enough to run on FPGA in parallel with matching, and image sizes are typically only 256 × 256, so a sensor-rate liveness check is feasible as a first line of defense.
In summary, FPGAs in fingerprint recognition have demonstrated: (i) very efficient implementations of traditional algorithms for fast authentication, (ii) viability of deep learning models for fingerprint classification/matching on-chip, and (iii) potential for integrated liveness detection. Fingerprint processing tends to involve fixed-size image data and bit-level operations (e.g., orientation maps, ridge thinning), which map well to digital logic. Combined with the parallel matching capability for large databases, FPGA-based accelerators are well positioned for future fingerprint systems, from secure portable devices to high-throughput AFIS (automated fingerprint identification system) servers.
We note an apparent literature gap in this modality: within the 2021–2025 window, the substantively reviewed FPGA-fingerprint works are [
67] (2021) and [
3] (2023), with [
17] (2025) addressing fingerprint PAD on the algorithmic side. We did not find peer-reviewed FPGA-realized fingerprint matching or PAD systems published in 2022 or 2024 that pass our inclusion criteria; this absence is itself a finding and may reflect the maturity of fingerprint-on-FPGA as a commercial-only design space rather than an open research one.
4.3. Iris Recognition on FPGA
Iris recognition is known for its high accuracy among biometrics, but it typically requires substantial image preprocessing (segmentation of the iris, normalization) and the extraction of fine-textured features (e.g., Gabor filter responses as in Daugman’s classic method). Implementing these steps on FPGA has the benefit of accelerating the computations and enabling real-time recognition even for high-resolution iris images. Additionally, iris systems often operate in constrained environments (like iris scanners or automated border control gates) where an FPGA can serve as a self-contained processing unit.
In recent literature, FPGAs have been utilized for both classical iris pipelines and CNN-based iris processing. A standout example is the work by Ruiz-Beltrán et al. (2023), who embedded an eye detection and focus assessment CNN on a Xilinx Zynq UltraScale+ MPSoC for a remote iris recognition system [
59]. In their setup, a high-resolution camera (16 megapixel) captures images from a distance, and the FPGA hardware must detect eyes in these images and ensure they are in focus before iris matching. They implemented a custom CNN using Xilinx’s Deep Learning Processing Unit (DPU) IP core to accelerate inference on the FPGA [
59]. The CNN was trained to output only well-focused eye crops; remarkably, the FPGA module could discard up to 95% of poor-quality eye regions, outputting clean 640 × 480 iris images for recognition [
59]. This is a strong demonstration of real-time, high-volume processing: handling 16 MP frames and performing focus detection at high speed, something feasible on the FPGA thanks to the optimized DPU accelerator and parallel pipelines. The system shows that even sophisticated pre-processing (like focus detection via CNN) can be offloaded to FPGA, reducing the data passed to subsequent stages.
For the core iris recognition (feature extraction and matching), classical approaches like 2D Gabor wavelet filtering and IrisCode generation have been mapped to FPGAs in the past. Those involve a lot of parallel convolution operations and bit-wise comparisons (for code matching), which are amenable to FPGA parallelism. More recently, researchers have explored deep learning for iris recognition. An ICFPT 2019 study by Ma and Sham designed a compact end-to-end iris recognition flow with a fully convolutional segmentation network and deployed it on a low-power SoC-FPGA, demonstrating efficient operation even with error-correction coding integrated [
69]. This suggests that tasks like iris segmentation can be accelerated by FPGAs to meet timing requirements.
Privacy and template security are central for iris on FPGA. The fabric can integrate cryptographic modules (AES, chaotic encryption) alongside the recognition pipeline. A 2021 cancellable iris scheme combined chaotic encryption with on-FPGA feature extraction so that stored templates cannot be inverted to the original iris [
6]; in high-security iris deployments (government ID, border control), this hardware-rooted trust ensures unencrypted iris data never leaves the chip.
In terms of performance, iris recognition FPGAs have achieved real-time throughput (often camera-limited, e.g., 5–30 frames per second depending on resolution). The Ruiz-Beltrán 2023 system effectively performs real-time eye detection on a video feed of walking subjects [
59], which is quite challenging given motion and focus issues—their success indicates the capacity of modern FPGAs (the Zynq UltraScale+ in this case, which has substantial resources and an ARM CPU for co-processing). With simpler setups (e.g., fixed eye position, lower resolution), an even smaller FPGA could suffice.
In summary, FPGA implementations for iris recognition are focusing on accelerating the early stages (segmentation, quality assessment) and ensuring reliability of the captured iris data, as well as potentially accelerating the feature encoding. The results from 2021–2025 confirm that FPGAs can handle high-resolution, computationally heavy iris tasks in real time. They also enhance the robustness of iris systems and maintain security via on-chip processing.
Despite the long-standing prominence of iris as a biometric modality, the 2021–2025 FPGA implementation literature is genuinely thin: this section discusses [
6,
38,
59,
69,
70,
71], with the rest of the iris recognition body being algorithmic rather than FPGA-realized. We attribute this scarcity to two factors: (i) deployed iris systems (border control, national ID) are dominated by closed commercial pipelines that do not publish hardware details, and (ii) iris matching has comparatively low compute cost once the IrisCode is generated, which reduces the incentive for academic FPGA acceleration. The section is therefore short due to the content rather than oversight.
A layered FPGA-based iris recognition implementation highlighting these considerations is shown in
Figure 3. The architecture illustrates how high-resolution iris images are captured and preprocessed, while the FPGA acceleration layer performs real-time segmentation, on-the-fly quality assessment, and feature encoding. The processing system (PS/CPU) layer manages feature matching, decision making, and interfacing with embedded or mobile applications.
4.4. Speaker/Voice Recognition on FPGA
Speaker recognition (voiceprint recognition) involves processing audio signals to either verify a speaker’s identity or identify who is speaking from their voice characteristics. This modality has historically been dominated by algorithms like Gaussian mixture models or i-vectors, but deep learning (e.g., x-vectors, spectrogram CNNs, RNNs) now sets the state of the art in accuracy. Deploying these on FPGAs is challenging because speech models (especially neural-network-based ones) can be large and often require sequential processing. Nonetheless, there have been advances in making compact, real-time speaker recognition on FPGA feasible, targeting applications like smart home voice authentication or secure voice login systems.
One line of work focuses on implementing deep speaker embedding models (such as the x-vector network, which uses a time-delay neural network) on FPGA. Mingjun et al. presented a neural network SoC design for speaker verification based on an x-vector extractor [
72]. They built a custom RISC-V-based SoC with an FPGA neural accelerator, optimized by techniques like model size reduction, pruning, and weight compression to handle the heavy model within edge constraints [
72]. Impressively, their optimized system achieved over 95% accuracy on the VoxCeleb dataset (which includes 1251 speakers), comparable to high-end software systems [
72]. To reach real time, they introduced a novel sparse matrix storage (BPCSR) to speed up the matrix computations on the FPGA fabric, and split tasks such that the FPGA handles the neural network math while the CPU does preprocessing [
72]. Jiao et al. ultimately synthesized their design into an ASIC consuming under 100 mW [
72], which indicates an FPGA realization would also be extremely low-power; this is a critical factor since many voice-based systems (smart speakers, wearables) need to listen continuously on battery.
The Jiao et al. system above is from 2019 and therefore predates the 2021–2025 survey window; we retain it as the foundational architectural reference for x-vector speaker verification on FPGA-SoC, since the design idioms it introduced (RISC-V + neural accelerator, sparse-matrix BPCSR storage, ASIC-targeted power envelope) are still the baseline against which later work is measured.
Within the 2021–2025 window, several developments have advanced speaker recognition FPGA implementations beyond the Jiao et al. baseline. First, on accuracy and model efficiency, Hong et al. (2024) [
4] quantized a state-of-the-art ECAPA-TDNN model to half size with only 0.07% degradation, addressing the over-allocation problem the 2019 design exhibited. Second, on hardware–software co-design, Tsai and Wang (2024) [
19] report a hardware-MFCC + GMM/HMM pipeline on Xilinx ZCU104 with 53.6 ms latency at 4.26 W, and Chen et al. (2024) [
73] report a 1D-CNN speaker embedding on Zynq XCZU2CG at 63.7 GOPS and 2.13 W, both showing that competitive accuracy now fits within sub-5 W envelopes that the 2019 baseline could not. Third, on adjacent voice tasks, recent FPGA work covers keyword spotting at 0.209 W [
74], speech recognition with time-depth-separable convolutions [
55], and binary-NN keyword spotting [
75], broadening the FPGA voice-biometric ecosystem beyond pure speaker verification. We organize the rest of this section around these three threads.
Earlier attempts to implement speaker recognition on FPGA without model compression faced difficulties. The 2019 Jiao et al. study used a high-end Xilinx VCU118 FPGA but consumed a substantial portion of its resources and was deemed inefficient without optimization [
72]. This highlights that naive mapping of large speech models to FPGA is not practical; instead, the model must be trimmed or quantized. Recent research by Hong et al. (2024) targets exactly that: they quantized a state-of-the-art ECAPA-TDNN speaker verification model to half its size with virtually no loss in accuracy (only 0.07% increase in error rate) [
4]. Such quantization results (to 8-bit or even lower precision) are promising for FPGA deployment, as they reduce memory and DSP requirements significantly.
On the simpler end, there have been FPGA implementations of small-footprint speaker ID systems. Additionally, FPGAs have been used for keyword spotting (a related voice task) with binary neural networks to achieve high energy efficiency [
75]. These indicate a trend: for always-on voice interfaces, an FPGA (or FPGA-like configurable logic in an SoC) can perform detection of keywords or speaker identity at the microphone, waking up larger systems only when needed—saving overall power.
In summary, speaker recognition on FPGA has made strides by adopting model compression and hardware–software co-design. Accuracy levels above 95% are attainable with modern techniques, and inference can be done in real time (processing audio frames on the fly, with sub-second latency). The FPGA’s edge in this modality is the ability to operate offline, securely, and efficiently.
To further illustrate the performance and implementation suitability of FPGA and FPGA-SoC platforms for this modality,
Table 3 summarizes and compares the most commonly used speaker recognition methods in terms of accuracy, computational complexity, and hardware deployment efficiency.
As shown in
Table 3, classical statistical approaches remain attractive for low-resource FPGA implementations, while CNN-based and embedding-based methods offer improved recognition accuracy with acceptable hardware overhead. FPGA-SoC architectures generally provide greater flexibility by enabling efficient task partitioning between programmable logic and embedded processors, making them particularly suitable for modern speaker recognition pipelines.
4.5. Finger Vein Recognition on FPGA
Finger vein recognition is an emerging biometric modality that uses near-infrared imaging to capture the sub-dermal vein patterns in a finger. It is considered highly secure (veins are internal and hard to counterfeit) and is used in applications like ATM authentication and access control. The computational tasks include image preprocessing, vein pattern enhancement/segmentation, and matching (often using pattern or minutiae-based approaches, or increasingly, CNN feature extraction). FPGAs have been applied to finger vein systems to achieve real-time processing and to embed the entire system in a single hardware unit.
A state-of-the-art example is the work by Janaki et al. (2024), who developed an FPGA-enhanced SoC for finger vein recognition using a novel deep learning model [
77]. They proposed a fusion CNN-ViT model (convolutional neural network combined with vision transformer) to improve finger vein recognition accuracy. Implementing this on an FPGA-based system yielded high recognition rates that outperformed previous methods, providing a “secure and efficient biometric authentication solution” [
77]. The use of a Conv-ViT hybrid is noteworthy—it suggests leveraging transformers for global feature capture and CNN for local feature extraction, which can boost performance on the fine-grained patterns of vein images. The result was a system that not only improved accuracy but also maintained or improved efficiency over prior CNN-only approaches.
Previously, other researchers have realized finger vein feature extraction on FPGAs using more traditional means. Algorithms that extract vein patterns via repeated line tracking or morphological operations have been ported to FPGA to speed up what would be slow software steps. These classical implementations benefit from FPGAs’ bit-level parallelism in image processing. However, the trend now is clearly towards deep learning. The 2024 Conv-ViT FPGA system [
77] is a direct continuation of this trend, proving that even advanced DL techniques can migrate to reconfigurable hardware for vein biometrics.
Another aspect where FPGAs shine is multimodal fusion involving finger vein. There are systems combining finger vein with face or fingerprint to improve reliability. An example from 2021 fused face and finger vein recognition using deep learning [
78]. In a deployment scenario, an FPGA could handle both modalities: acquiring a face image and a finger vein image, running both recognition algorithms in parallel hardware threads, and fusing the result. The parallel nature and ample I/O of FPGAs support this well.
In summary, finger vein recognition has benefited from FPGA acceleration by enabling novel DL models to run at the edge and by providing a compact, secure processing unit for this modality. Two important caveats accompany this finding. First, the section is short by necessity: in 2021–2025 the substantively FPGA-realized finger vein works we identified are [
77] (Janaki et al., 2024, Conv-ViT) and [
79] (Chang et al., 2023, CNN feature extraction), with [
78] addressing multimodal fusion algorithmically; finger vein FPGA implementation remains an underrepresented area in the open literature. Second, on the throughput claim: the Conv-ViT system in [
77] reports 387 ms per inference, which corresponds to roughly 2.6 inferences/s and is therefore well below the typical ≥30 fps threshold we use for “real time” elsewhere in this review. We therefore characterize [
77] as “high-throughput at second-scale” rather than real time, and we recommend [
79] (365 ms) and the broader pattern of CNN+pruning+quantization for applications that need closer-to-real-time finger vein authentication.
Table 4 summarizes representative FPGA-based biometric recognition systems across multiple modalities. The comparison covers both deep-learning-based and classical approaches, highlighting the diversity of algorithmic designs and hardware platforms used in recent literature. For each work, key characteristics such as the target modality, implementation platform, performance metrics (latency, throughput, and accuracy), core technical methods, and reported power consumption are presented.
Within each modality of
Table 4, deep-learning-based implementations dominate the 2021–2025 entries, particularly for face and speech recognition, mainly due to their strong reported accuracy and improved robustness to environmental variability. These approaches typically rely on convolutional, recurrent, or spiking neural networks, often combined with hardware-aware optimizations such as quantization, pruning, and parallel execution to meet real-time FPGA constraints. Classical FPGA-based solutions, while sometimes reporting slightly lower accuracy, continue to offer advantages in architectural simplicity, deterministic latency, and lower power consumption, making them suitable for resource-constrained or safety-critical applications. We frame these as
intra-modality observations: “dominant,” “best fit,” and “balanced compromise” claims elsewhere in this review (
Section 3,
Section 4.1 and
Section 4.4) refer to the relative standing of approaches within a single modality and a single deployment regime, not to a cross-modality ranking. Cross-modality numerical comparison is not licensed by
Table 4 because the underlying tasks (closed-set face identification, open-set speaker verification, transcription, etc.) and the underlying datasets (VoxCeleb, LFW, ORL, finger vein databases) are not equivalent in difficulty or in evaluation protocol; the same numerical figure can mean very different things across rows.
It should be noted that performance metrics reported in the literature are not always directly comparable. Accuracy is expressed using different measures depending on the modality and task, such as recognition accuracy, precision, equal error rate (EER), average precision (AP), or word error rate (WER). Similarly, latency and throughput are reported in various forms, including milliseconds per inference, frames per second, or operations per second. Power consumption is not consistently disclosed across all studies, and in several cases, only qualitative comparisons with CPU or GPU implementations are provided.
To help the reader read across the heterogeneous metrics in
Table 4, three calibration points are useful. First, classification accuracy and EER are not interconvertible: a 99.45% recognition rate ([
80], ocular YOLOv8) and EER < 2.15% ([
30], voiceprint) describe different decision tasks (closed-set classification vs. open-set verification) and operating regimes; we recommend comparing across rows of the same modality only. Second, AP ([
81], AP = 91.6%) is a precision-recall integral that approaches but does not equal accuracy at balanced operating points; treat it as a lower bound on the headline accuracy a designer can expect. Third, WER ([
55], WER ≈ 9.1%) is a transcription metric and is included because the work is FPGA-implemented voice processing; it is not a speaker-identity figure and should not be compared to speaker-recognition accuracy. We mark these distinctions because cross-row comparison without them is misleading even when the cells are otherwise similar.
Power is reported as “N/R” in 7 of 27 entries in
Table 4 (about 26%). This incompleteness reflects two literature patterns: (i) algorithm-focused FPGA papers often omit measured power because the silicon is treated as a fixed validation platform rather than the contribution; and (ii) academic FPGA boards (PYNQ-Z2, ZYBO Z7, MZ7030FA) have well-characterized device-class power envelopes that authors implicitly rely on the reader to know. To support cross-work interpretation, we have organized
Table 4 chronologically rather than by power; readers can use the platform column as a coarse device-class proxy (Cyclone-10/Artix-7: tens to hundreds of mW; Zynq-7000/Cyclone V: low single-watt; Zynq UltraScale+/ZCU104/ZCU102: single- to mid-watt; high-end VC707/UltraScale+: mid- to higher-watt) when the source paper does not disclose a measured number.
Reading
Table 4 as more than a catalogue requires explaining why some architectures outperform others on the criteria designers actually face. Three patterns dominate the higher-performing rows. First, on latency, the most efficient face recognition entries are those that pair architectural compression with hardware-aware design rather than relying on either alone: the depthwise/pointwise separable-convolution pipeline of Tsai et al. (2024) on Cyclone V [
61] reaches 30.3 ms/ID at 564 mW because the convolution primitive is itself cheaper, and the SqueezeNet pipeline of Walter et al. (2021) on Zynq Ultra96-V2 [
60] reaches 31 ms at 4.8 W by combining pruning with quantization-aware training. Second, on power, the lowest envelopes are not from the largest devices but from event-driven and binarized models: TENet keyword spotting at 0.209 W on Zynq 7z020 [
74] and the spiking LSTM at 0.84–1.1 W on Artix-7/Zynq-7000 [
76] both exploit sparsity that dense CNNs cannot. Third, on accuracy under tight power budgets, entries that report both high accuracy and sub-1-W operation tend to use a single-purpose feature pipeline rather than a general DL stack: the iris CNN of Lin et al. (2021) on Cyclone-10 LP [
70] attains 96.88% at 84.6 mW because the network is matched to the IrisCode size rather than scaled up. The trade-offs that emerge are concrete: separable convolutions trade kernel expressiveness for DSP density; pruning + quantization trades parameter count for additional training effort; sparsity-driven models trade dense throughput for far lower power on event streams; and modality-matched single-purpose pipelines trade reusability for accuracy at the operating point of interest.
Table 4.
Summary of recent FPGA-based biometric recognition systems (2021–present), including modality, employed methods, and reported performance metrics.
Table 4.
Summary of recent FPGA-based biometric recognition systems (2021–present), including modality, employed methods, and reported performance metrics.
| Ref. | Modality | Appr. | Platform | Latency/Throughput | Accuracy | Core Technique | Power |
|---|
| 2025
|
| [80] | Ocular recognition | DL | Xilinx ZCU104 | 20.3 ms | 99.45% | YOLOv8 | N/R |
| [1] | Face recognition | DL | ZYBO Z7, PYNQ-Z2 | N/R | ∼85–87% | CNN (GoogLeNet), HW/SW co-design | N/R |
| [62] | Face detection | DL | ZCU102 | 55.44 fps | 91.73% | DNN face detector | 46 mJ |
| [75] | Speech keyword spotting | DL | Xilinx VC707 | 2.2 GOP/s | 97.29% | Binary NN (PSE-BNN) | N/R |
| [82] | Face recognition | CL | FPGA DE1-SoC | 60–100 ms/frame | 80% (ORL), 70% (LFW) | Haar Cascade, LBPH, Euclidean | 5–10 W |
| [83] | Multimodal (face + iris) | CL | FPGA-friendly b | N/R | EER < ; AROC > 99% | Nonlinear fusion, chaotic key gen. | N/R |
| [30] | Voiceprint | DL | Xilinx UltraScale+ | 172 ms | EER < 2.15% | MBPC features, ASNorm scoring | 3.2 TOPS/W |
| [84] | Voice disorder detection | DL | Xilinx ZCU102 | N/R | 99.50% | Pruned CNN classifier | N/R |
| 2024 |
| [76] | Speech recognition | DL | Artix-7/Zynq-7000 | N/R | 73% | Spiking LSTM, quantization | 1.1/0.84 W |
| [77] | Finger vein | DL | Zynq XCZU4EV MPSoC | 387 ms; 245.8 GOPS | 98% | CNN–ViT hybrid | 3.67 W |
| [73] | Speaker verification | DL | Zynq XCZU2CG | 63.7 GOPS @ 200 MHz | N/R | 1D-CNN speaker embedding | 2.13 W |
| [85] | Speech recognition | DL | MZ7030FA | 57.8 GOPS | N/R | Spatio-temporal CNN | 2.21 W |
| [61] | Face recognition | DL | Cyclone V 5CSEBA6 | 274 M GOPs | 99.2% | Depthwise/pointwise conv. | 564 mW |
| 2023 |
| [19] | Speaker verification | CL | Xilinx ZCU104 | 53.6 ms | 93.30% | HW-accelerated MFCC + GMM/HMM | 4.26 W |
| [79] | Finger vein | DL | Xilinx XC7Z020 | 365 ms | 95.80% | CNN feature extraction | N/R |
| [34] | Face recognition | DL | Xilinx ZCU104 | 30.3 ms/ID; 8 streams in 342.3 ms | 99.34% | Optimized DNN pipeline | N/R |
| [86] | Face recognition | CL | Altera DE1-SoC | N/R | 96.7/86.7/83.3% a | LBPH, Chi-square matching | 1.6 W |
| [59] | Eye detection | DL | Zynq XCZU4EV MPSoC | N/R | 95–100% | Tiny-YOLOv3 | N/R |
| 2022 |
| [60] | Face recognition | DL | Zynq Ultra96-V2 | 31 ms | 84% | SqueezeNet, pruning, quantization | 4.8 W |
| [2] | Face ethnicity | DL | DE1-SoC/GPU | 2.76 s/5.1 s | 96.90% | DCNN ethnicity classification | 10.7/650 W |
| [55] | Speech recognition | DL | FPGA + smartphone | 125.5 ms | WER ≈ 9.1% | Time-depth separable CNN | N/R |
| [74] | Keyword spotting | DL | Xilinx Zynq 7z020 | 7266 cycles | 95.36% | TENet, simplified MFCC | 0.209 W |
| 2021 |
| [70] | Iris recognition | DL | Cyclone-10 LP | 7.6 ms | 96.88% | CNN iris feature learning | 84.6 mW |
| [71] | Iris recognition | CL | Zynq UltraScale+ XCZU7EV | N/R | EER = 0.20%; ZeroFAR = 0.50% | QC-LDPC, biometric key gen. | N/R |
| [81] | Face recognition | DL | Xilinx PYNQ-Z2 | Real-time | AP = 91.6%; Acc. = 92% | Custom parallel HW | N/R |
To make the trade-offs comparable across rows,
Table 5 normalizes a representative subset of
Table 4 by performance per watt. Where the source paper reports either inferences per second per watt (inf/s/W) or operations per second per watt (TOPS/W or GOPS/W), we use the directly reported figure; where only latency and power are reported, we compute inferences/s by inverting the latency and divide by the reported power, and where the source paper provides energy per inference instead of power, we recover an effective power at the reported throughput. Entries with N/R power are excluded from this normalization. The resulting figures should be read as order-of-magnitude indicators rather than as benchmark-grade comparisons, because the underlying tasks differ in complexity (e.g., 1251-speaker x-vector verification vs. 8-keyword spotting) and the underlying datasets differ in difficulty, but they do permit a coarse ranking of how efficiently each architecture converts watts into useful biometric inference.
To support intra-modality interpretation of
Table 4,
Table 6 provides per-row metadata covering the dataset on which the cited result was reported, the task type (closed-set classification, open-set verification, detection, transcription, or fusion), the metric definition (i.e., what the headline accuracy figure actually measures), and the reporting condition for the power figure (measured on the FPGA versus reported relative to a software baseline versus not disclosed). This metadata is split off from
Table 4 to keep that table readable; the two tables share the
Ref. column and can be read side by side.
5. Common Architectures and Hardware Design Patterns
Across the different biometric modalities implemented on FPGAs, we observe several recurring hardware architecture patterns and design strategies. These patterns enable efficient utilization of FPGA resources and facilitate meeting real-time constraints. Below, we discuss these common approaches.
Streaming Pipeline Architecture: Many FPGA biometric systems are designed as streaming pipelines, where data flows continuously through a series of hardware stages (e.g., preprocessing → feature extraction → matching) with minimal buffering. This exploits the FPGA’s ability to connect custom logic stages with FIFOs. For example, in face or iris recognition, an image can be streamed in pixel-by-pixel, filtered and processed on the fly by CNN layers implemented as pipelines. The FINN framework by Umuroglu et al. [
7] pioneered such stream architectures for binarized neural networks on FPGA, and similar ideas are seen in recent works [
2]. A side benefit is latency predictability, which is important for biometric systems needing deterministic response times.
Parallel Processing and Data-Level Parallelism: FPGAs allow massive parallelism, and designers leverage this by replicating computation units. In CNN accelerators, it is common to instantiate an array of processing elements (PEs) that perform multiple MAC operations in parallel. For instance, AlBdairi et al. used a 64-multiply array for their face CNN on FPGA [
2], and many others use parallel convolution engines to process several image rows or channels simultaneously. In fingerprint matching, parallel comparators can check one probe print against many stored templates at once (hardware matching engines). Designs often settle on layer-wise parallelism for CNNs (computing multiple outputs in parallel) and vectorized arithmetic for signal processing (e.g., SIMD-like operations using FPGA DSP blocks).
Quantization and Low-Precision Arithmetic: Almost all recent FPGA deep learning implementations use reduced precision for weights and activations—commonly 8-bit, 16-bit, or even 1–4 bits for extreme cases. This is because FPGA DSP blocks can often perform multiple low-precision operations in one block, and lower bit-widths reduce memory usage and increase throughput. The Microchip face recognition used INT8 quantization on the PolarFire SoC [
5]. Similarly, speaker recognition models were quantized (e.g., 8-bit or fixed-point) to fit the FPGA without losing accuracy [
4]. The challenge is ensuring the quantized model still meets accuracy requirements, which often involves retraining or fine-tuning the network with quantization-aware training.
Use of Vendor IP Cores and HLS: To accelerate development, designers frequently use vendor-provided IP and high-level synthesis (HLS) tools. Xilinx’s DPU (Deep Learning Processing Unit) is a prime example, used in the iris detection work [
59] and other studies, to deploy CNNs without hand-coding the RTL. Similarly, Intel’s OpenVINO and oneAPI toolflows can target FPGAs for accelerating networks. For custom logic (e.g., a novel algorithm, or encryption), high-level synthesis is often used to write C/C++ and synthesize to HDL, instead of manual VHDL/Verilog. HLS has matured to the point where real-time biometric algorithms (like filtering, FFT, etc.) can be synthesized with decent quality of results.
Memory Optimization and Tiling: FPGA block RAM is limited, so designs employ tiling of data and double-buffering to handle large images or networks. When a CNN’s intermediate feature maps do not fit on-chip, a common strategy is to tile the input (process in strips or patches) or to stream from off-chip DRAM in smaller chunks. Double buffering allows an FPGA to overlap computation and data movement (one tile is processed while the next tile is fetched). By smartly choosing tile sizes, designers maximize on-chip reuse and avoid saturating memory bandwidth.
Custom Hardware Accelerators for Specific Functions: In many biometric pipelines, certain functions become bottlenecks—and designers create custom accelerators for those. For example, iris code generation might involve heavy bit-wise operations; a custom bit pipeline can generate 2048-bit iris codes extremely fast. Likewise, distance computation (e.g., Hamming distance or Euclidean distance between feature vectors) can be turned into a fully unrolled combinational circuit for speed. An example: a chaotic encryption module was integrated into a biometric system on FPGA to secure templates [
6]—that is, a custom accelerator for cryptography.
Heterogeneous SoC Integration: Modern FPGAs often come as part of SoC with hard CPU cores (e.g., ARM Cortex-A in Xilinx Zynq, or RISC-V in Microchip PolarFire SoC). Many biometric designs exploit this heterogeneity—using the CPU for tasks like system coordination, high-level decision logic, or portions of the algorithm that are sequential, while the FPGA fabric tackles the parallelizable heavy lifting (neural network layers, filtering, etc.). The speaker verification SoC had a RISC-V for feature extraction while the accelerator did the DNN math [
72]. The challenge is managing the data transfer between CPU and FPGA; solutions like AXI-interconnects and high-throughput DMA are commonly used.
In an FPGA-SoC biometric system (
Figure 4), the processing system (PS) typically manages sensor interfacing, data preprocessing, feature matching, and final recognition decisions. The programmable logic (PL) is mainly used to accelerate computationally intensive tasks such as CNN-based feature extraction, DSP operations, and optional cryptographic processing. Communication between PS and PL is achieved through a high-speed internal interface (e.g., AXI in AMD/Xilinx devices or AHB/APB in Microchip SoCs).
Reusable Design Templates: Over the years, the community has built reusable templates—for instance, parameterizable CNN accelerators where one can specify number of PEs, memory depth, etc., to target a specific FPGA and network. By adjusting a few parameters, a designer can map a new biometric model to the existing accelerator structure. This reuse greatly speeds development and ensures that common patterns (like convolution computation, pooling, activation) are done in a resource-efficient way [
58].
In essence, these design patterns highlight that successful FPGA implementations in biometrics carefully co-optimize algorithm and hardware. They simplify or approximate computations to fit FPGA resources (quantization, tiling), heavily parallelize work (pipelining, replication of compute units), and use available toolflows and IPs to shorten development time.
The eight patterns above are not equally impactful. Based on how often they appear in the cited 2021–2025 implementations and how directly they enable real-time operation under the device-class power envelopes typical of edge biometrics, we group them into three priority tiers. Universal, highest impact (every modality, every cited deep learning implementation): streaming pipeline architecture, parallel processing/data-level parallelism, and quantization/low-precision arithmetic. These three together account for essentially all real-time FPGA biometric inference reported in this review. Cross-modality enablers: vendor IP cores/HLS, memory optimization/tiling, and heterogeneous SoC integration; these reduce design effort and unlock larger models per device but are not by themselves sufficient for real-time operation. Modality-specific or narrowly applicable: custom hardware accelerators (most often used for iris IrisCode generation, fingerprint matching, and cryptographic primitives) and reusable design templates (highest payoff in face-recognition CNN families where the cited implementations share architectural primitives). Designers facing a fixed device budget should prioritize the first tier; the second tier should be invoked when development time, not silicon, is the binding constraint; and the third tier is justified only when a specific bottleneck has been profiled.
Table 7 summarizes how different FPGA-based design patterns affect biometric recognition systems.
6. Security and Anti-Spoofing in FPGA-Based Biometrics
Security is a two-fold concern in biometric systems: system security (protecting biometric data and operations from malicious attacks) and presentation attack detection (PAD) (ensuring the input biometric is genuine and not a spoof). FPGAs contribute to both aspects in unique ways.
On-Device Data Privacy and Template Security: One inherent advantage of FPGA-based processing is that all biometric data can be confined to a dedicated hardware device, reducing exposure. An FPGA can be programmed such that raw biometric images or features never leave the chip unencrypted. For instance, in a fingerprint or iris system, the FPGA can perform feature extraction and immediately apply encryption or a one-way transform to produce a secure template. The 2022 cancellable biometric cryptosystem implemented a 3D chaotic encryption on FPGA to protect templates of multiple biometrics [
6]. They demonstrated that storing encrypted biometric templates (instead of raw templates) does not degrade recognition performance but drastically improves security. The FPGA was central to this, because it allowed the heavy chaos-based encryption algorithm to run in real time alongside the recognition algorithm, with negligible overhead due to parallel execution. Moreover, FPGAs themselves offer bitstream encryption and physical security features (some FPGAs have anti-tamper, physically unclonable functions, etc.), meaning the design bitstream and any keys can be protected on the device.
Presentation Attack Detection (Liveness/Anti-Spoofing): FPGAs can be employed to run PAD algorithms in parallel with recognition. PAD is crucial: for face recognition, one must detect if the face is a photo or mask; for fingerprint, if it is a fake silicone finger; for speaker, if it is a playback recording or deepfake audio; for iris, a printed contact lens; and for finger vein, a prosthetic finger with a copied pattern. Many PAD algorithms are based on pattern analysis and increasingly on deep learning classification of “real vs. fake” samples. Real-time PAD is needed so that a fake can be rejected before or alongside the recognition decision. For fingerprint liveness detection, researchers have developed lightweight CNNs that examine texture differences between live skin and spoof materials [
17]. An FPGA-based fingerprint scanner could thus output a liveness score with virtually no delay, since the logic checking for liveness can be placed right after the image capture in the pipeline. We note an explicit gap in the surveyed 2021–2025 literature: only [
57] is a hardware-realized FPGA PAD implementation (face anti-spoofing on a face recognition pipeline), and the other PAD references in this review (e.g., [
17] for fingerprint, related work for face) are algorithmic studies that have not been ported to FPGA in the cited works. This is a significant open area: algorithm-side PAD has matured rapidly, but FPGA-realized PAD has not kept pace, leaving a deployment gap between research-grade PAD models and the on-device anti-spoofing that production biometric systems would benefit from.
System Resilience and Attack Surface Reduction: Biometric systems on FPGAs can be more resistant to software attacks like malware or tampering, because the FPGA configuration is hardware and not easily altered by external software. There is no open OS on the FPGA fabric that an attacker can log into—only the defined logic circuits. This security by design limits the attack surface. For instance, lightweight FPGA intrusion-detection frameworks based on hardware fingerprinting have been demonstrated for SoC platforms [
87]. The cited work targets generic SoC network security rather than biometric data, but the underlying mechanism, on-chip detection of behavioral deviations using fixed-function hardware monitors, is directly applicable to biometric pipelines: the same primitive can flag anomalous access to template memory or unexpected control-flow at the matcher, which raises the bar for software-rooted attacks against an FPGA-resident biometric system.
Case Study—Cancellable Biometrics and FPGA: Cancellable biometrics refer to intentional distortion of biometric data via a key or transform such that if compromised, the transform can be changed without having to get a new biometric from the user. In the cited hybrid system [
6], the biometric features are combined with a chaotic encryption keyed by a user-specific secret. The FPGA performs this in real time, outputting an encrypted feature vector to the matcher. If the database or key is compromised, a new key can be issued and the transform changed (the FPGA can be reprogrammed with a new encryption mapping), “cancelling” the old biometric template. FPGA provides the speed and a measure of physical security (since the key can be burned into the FPGA eFuse or only exists in hardware logic).
Multimodal Fusion for PAD: Interestingly, FPGAs can fuse information from multiple sensors to improve spoof detection. For example, a face recognition system might use both an RGB camera and a thermal camera—a live face has a certain heat pattern, while a mask might not. An FPGA can process both streams (visible and thermal) and combine them at frame rate to decide liveness. This sensor fusion at the hardware level is a strength of FPGAs with abundant I/O and parallelism.
Figure 5 presents the main security considerations in FPGA-based biometric systems, broadly classified into system security and presentation attack detection (PAD).
7. Challenges and Limitations
Despite their advantages, FPGA-based biometric systems face a number of challenges and limitations that researchers and engineers must navigate.
Resource Constraints and Scalability: FPGAs have finite logic resources (LUTs, flip-flops), finite block memory (BRAM/URAM), and a limited number of DSP multiplier blocks. Sophisticated biometric algorithms, especially deep neural networks with millions of parameters, can easily exceed the capacity of a single FPGA if not optimized. While one can off-load some layers to an external processor or break the design into time-multiplexed operations, doing so may sacrifice performance. There is a gap when it comes to running the absolute state-of-the-art large models on a single low-power FPGA.
Design Complexity and Development Time: Developing an optimized FPGA solution requires hardware design skills and a deep understanding of both the algorithm and FPGA architecture. The learning curve is steep for teams coming from a software-only background. Although HLS and IP libraries have improved productivity, achieving timing closure at high clock rates or optimizing resource usage still often involves manual tuning at the RTL level. The iterative debug cycle on FPGA (synthesis, place-route, test) is slower compared to software iterations.
Fixed-Point and Numeric Precision Issues: When using quantized arithmetic (e.g., 8-bit, 16-bit fixed-point), there is always a risk of numerical issues like overflow, underflow, or loss of precision affecting recognition accuracy. Biometric algorithms can be sensitive; for example, small differences in a fingerprint feature vector might change a match outcome from true to false. Ensuring that reduced-precision arithmetic does not significantly degrade accuracy is challenging. The designer must validate that the chosen precision yields an acceptable accuracy on target datasets, which requires co-simulation of the FPGA arithmetic with the algorithm.
Memory Bandwidth and I/O Limitations: Many biometric tasks are data-intensive. High-resolution images, streaming audio, or large enrollment databases all require moving a lot of data. An FPGA on an embedded board might have limited external memory bandwidth. Some designs overcome this with clever caching or on-chip storage of the most-used data, but the entire working set cannot always fit in BRAM. In some cases, designers resort to parallel memory (widening the bus) or multiple memory controllers to feed the pipeline. The roofline model for FPGA often shows that many designs are memory-bound rather than compute-bound [
58].
Power and Thermal Constraints in Practice: While FPGAs are generally more power-efficient than CPUs for the same task, they are not as efficient as custom ASICs for very large deployments. In battery-powered devices, every milliwatt counts. High-end FPGAs can consume significant power (tens of watts) if fully utilized at high clocks, which might necessitate heatsinks or fans. There is also the consideration of dynamic power vs. static power: FPGAs have a notable static power draw just to power the fabric. That said, the success of sub-1 W speaker verification chips [
72] and no-fan face recognizers [
5] shows this is manageable with the right design and device choice.
Algorithmic Adaptation and Accuracy Trade-offs: Not all biometric algorithms map neatly to an FPGA. Some involve complex control flows or iterative procedures that are hard to implement efficiently in hardware. Sometimes the solution is to modify the algorithm—but that might reduce accuracy. There is often a back-and-forth needed between algorithm designers and hardware engineers to find a sweet spot.
Cost and Flexibility Considerations: High-performance FPGAs can be expensive relative to commodity GPUs or mobile SoCs. For consumer devices, adding an FPGA might raise the BOM cost unless a smaller, cheaper FPGA can do the job. Additionally, once deployed, updating an FPGA in the field (bitstream update) is possible but not trivial; whereas software updates on a CPU are easier.
When FPGAs Are Not the Optimal Choice: The arguments above for FPGA acceleration assume a deployment regime where deterministic latency, on-device privacy, and watt-level energy budgets are jointly binding. In several practical regimes they are not, and FPGAs are then the wrong tool. (i) For batch or non-real-time enrollment back ends (e.g., one-time large-scale fingerprint indexing, periodic template re-encoding), GPU clusters typically deliver better throughput per dollar and per engineering hour, and the FPGA’s deterministic latency advantage is irrelevant. (ii) For consumer devices where the recognition algorithm is expected to evolve every few months (e.g., consumer face unlock receiving new spoofing-defense models in OTA updates), software updates on a mobile NPU/CPU are far cheaper than re-synthesizing and re-validating a bitstream; partial reconfiguration narrows but does not close this gap. (iii) For very large transformer-class models that exceed on-chip BRAM/URAM, current FPGAs are forced into bandwidth-bound external-memory operation, where mid- and high-end edge GPUs with HBM or LPDDR5X retain a clear advantage. (iv) In cost-sensitive consumer products at high volumes (> 1M units/year), an ASIC or a fixed NPU SoC will eventually beat an FPGA on unit cost and power even if it is initially worse on time-to-market. The FPGA case is strongest in the intersection: low-volume or specialized devices, regulated or privacy-critical workloads, deterministic latency requirements, and an algorithm stack that is mature enough to amortize bitstream development.
Comparison with Edge AI SoCs and NPUs: Modern edge AI SoCs and standalone neural processing units (NPUs) directly compete with FPGAs in the watts-per-inference regime that this review emphasizes. Representative platforms include the NVIDIA Jetson Orin Nano (40 TOPS at 7–15 W, software stack via JetPack and TensorRT), Google Coral Edge TPU (4 TOPS at 0.5 W typical via TFLite), Hailo-8 and Hailo-15 NPUs (26 TOPS at 2.5 W and higher TOPS at single-digit W, respectively), Qualcomm Hexagon NPU integrated in Snapdragon SoCs, Apple Neural Engine on A- and M-series chips, and Intel/AMD CPU-integrated AI accelerators. Relative to FPGAs, these SoCs typically offer (i) higher headline TOPS at comparable power because the silicon is fixed-function and routing overhead is absent, (ii) shorter time-to-deployment because the toolchain is software-only with no synthesis or place-and-route, and (iii) larger first-class model libraries through TFLite, ONNX, or vendor SDKs. FPGAs retain four advantages relative to fixed NPUs: deterministic latency at the cycle level, custom-precision arithmetic below INT8, integration of non-NN logic in the same fabric (e.g., chaotic encryption [
6], custom MFCC [
19], hardware fingerprinting for IDS [
87]), and reconfigurability for emerging modalities or threat models that the NPU’s fixed datapath cannot accommodate. For a face-recognition deployment that does not need any of these four, an Edge TPU or Hailo-8 will typically be the better choice. The decision is therefore not “FPGA versus everything else” but “which of the four FPGA-only properties is binding for this product,” and our review is most useful for the regime where at least one is.
Table 8 lists the key FPGA challenges, their impact, and prospective strategies for mitigation.
8. Future Trends and Research Directions
Looking ahead, the intersection of FPGAs and biometric recognition is poised for exciting developments. We identify several emerging trends and potential research directions from 2026 onward, and we annotate each direction with a rough time horizon to help readers prioritize: near-term (1–2 years, infrastructure already exists in published work or shipping silicon), medium-term (3–5 years, requires non-trivial new toolflow or device capability), and long-term/speculative (5+ years, depends on co-evolution of sensors, algorithms, and devices).
Heterogeneous FPGA Architectures (SoC and ACAP) (near-term): The next generation of FPGAs, often termed adaptive compute acceleration platforms (ACAPs) (such as Xilinx Versal), incorporate not just programmable logic and a CPU, but also AI-specific compute engines (e.g., vector processors, DSP arrays) and network-on-chip infrastructure. These architectures will provide out-of-the-box acceleration for AI workloads, making it easier to deploy complex biometric models. Research will likely explore how to optimally partition biometric tasks across CPU, FPGA fabric, and AI engines.
Table 9 summarizes recent FPGA families from major vendors, highlighting their heterogeneous compute elements, AI/ML support, high-level design toolchains, and key features beneficial for real-time biometric processing.
High-Level Design Automation and ML Framework Integration (near-term): We expect continued improvement in design tools that allow biometric engineers to target FPGAs without deep hardware expertise. Already, deep-learning-to-FPGA compilers and ML-framework plugins are under development. Open-source toolchains (e.g., based on LLVM and MLIR for hardware) might play a role, reducing reliance on vendor-specific flows.
Neuromorphic and Spiking Neural Networks on FPGAs (near-term): A forward-looking area is the use of neuromorphic computing principles for biometric recognition. Some biometric sensors (like event-based vision sensors or auditory cochleas) output spiking data. FPGAs are well suited to implement spiking neural networks (SNNs) or other brain-inspired models due to their parallelism and timing precision. This direction is already moving from speculation to practice: Yin et al. (2024) [
76], reviewed in
Section 4.4, demonstrate a spiking LSTM accelerator for automatic speech recognition at 0.84–1.1 W on Artix-7, directly validating the energy-efficiency premise of the SNN approach for voice biometrics. Research could extend this to SNN-based face or gait recognition implemented on FPGA, potentially achieving ultra-low power consumption by processing sparse spike events rather than dense frames.
Multimodal Biometric Fusion Systems (medium-term): Future biometric systems increasingly use multiple traits for higher security (e.g., face + voice, or fingerprint + vein). FPGAs can serve as central hubs that process and fuse these modalities. We anticipate architectures where a single FPGA processes data from multiple sensors concurrently, and research on fusion algorithms (possibly using multi-stream neural networks or decision-level fusion) that the FPGA can handle in real time.
Enhanced Security Features (medium-term): Security research in FPGA-based biometric systems is expected to further explore resistance to side-channel attacks and the exploitation of FPGA-specific security features. Recent hardware security approaches applicable to embedded platforms include physically unclonable functions (PUFs), side-channel defense architectures (SDA), post-quantum cryptography (PQC), and secure boot mechanisms [
88]. Physical unclonable functions in an FPGA could be used not just for key storage but even to generate individualized variants of algorithms.
FPGA in Wearable and IoT Biometrics (medium-term): As biometrics expand into wearable devices (heart-rate ECG authentication, gait recognition with wearables, face recognition on AR glasses), there will be a push to use extremely small, low-power FPGAs or programmable logic in those form factors. Companies like Lattice Semiconductor already produce tiny FPGAs for always-on AI in wearables. We foresee research on compressing biometric algorithms to fit these tiny chips.
Collaboration of FPGAs with Cloud (Edge–Cloud Synergy) (long-term): Another trend could be systems where the edge FPGA handles first-line processing (including PAD and basic feature extraction), but then communicates with cloud services for heavy-duty tasks like searching a large database of identities. Research might explore secure communication protocols and hardware-accelerated homomorphic encryption on an FPGA to send encrypted features to the cloud, so the cloud never sees raw data, yet can still match.
Emerging Biometric Modalities and Custom Hardware (long-term/speculative): Finally, new biometric modalities like EEG-based authentication, odor signatures, or DNA-based biosensors might arise. FPGAs, being reconfigurable, are ideal platforms to prototype hardware for these unconventional biometrics. The research direction here is how to quickly adapt FPGA frameworks to completely new signal types and algorithms.
9. Conclusions
In this review, we surveyed recent FPGA-based implementations of biometric recognition across face, fingerprint, iris, speaker, and finger vein modalities, highlighting how hardware–algorithm co-design enables real-time, energy-efficient, and privacy-preserving inference at the edge. Across modalities, successful systems consistently leverage streaming architectures, parallel processing, quantization, and heterogeneous SoC-FPGA integration to achieve competitive accuracy while meeting strict latency and power constraints. Beyond performance, FPGA platforms provide inherent advantages for secure biometric deployment, including on-device processing, template protection, and support for presentation attack detection. Although challenges remain in resource scalability, memory bandwidth, and development complexity, ongoing advances in FPGA architectures, AI-optimized toolflows, and model compression techniques continue to lower these barriers. Overall, the evidence indicates that FPGAs represent a compelling and adaptable platform for next-generation biometric systems requiring deterministic performance, strong security, and efficient edge intelligence.