Next Article in Journal
Research on Encrypted Transmission and Recognition of Garbage Images in Low-Illumination Environments
Previous Article in Journal
Adaptive Edge–Cloud Framework for Real-Time Smart Grid Optimization with IIoT Analytics
Previous Article in Special Issue
Robotic Systems for Cochlear Implant Surgeries: A Review of Robotic Design and Clinical Outcomes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

FPGA-Accelerated ECG Analysis: Narrative Review of Signal Processing, ML/DL Models, and Design Optimizations

Bases of Electronics Department, Faculty of Electronics, Telecommunications and Information Technology, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(2), 301; https://doi.org/10.3390/electronics15020301
Submission received: 29 November 2025 / Revised: 3 January 2026 / Accepted: 7 January 2026 / Published: 9 January 2026
(This article belongs to the Special Issue Emerging Biomedical Electronics)

Abstract

Recent advances in deep learning have had a significant impact on biomedical applications, driving precise actions in automated diagnostic processes. However, integrating neural networks into medical devices requires meeting strict requirements regarding computing power, energy efficiency, reconfigurability, and latency, essential conditions for real-time inference. Field-Programmable Gate Array (FPGA) architectures provide a high level of flexibility, performance, and parallel execution, thus making them a suitable option for the real-world implementation of machine learning (ML) and deep learning (DL) models in systems dedicated to the analysis of physiological signals. This paper presents a review of intelligent algorithms for electrocardiogram (ECG) signal classification, including Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Convolutional Neural Networks (CNNs), which have been implemented on FPGA platforms. A comparative evaluation of the performances of these hardware-accelerated solutions is provided, focusing on their classification accuracy. At the same time, the FPGA families used are analyzed, along with the reported performances in terms of operating frequency, power consumption, and latency, as well as the optimization strategies applied in the design of deep learning hardware accelerators. The conclusions emphasize the popularity and efficiency of CNN architectures in the context of ECG signal classification. The study aims to offer a current overview and to support specialists in the field of FPGA design and biomedical engineering in the development of accelerators dedicated to physiological signals analysis.

1. Introduction

In the last decade, major advancements in hardware architecture and the increase in computational capabilities have supported the expansion of artificial intelligence (AI) and machine learning (ML) techniques across numerous scientific and engineering fields [1]. Modern systems increasingly integrate intelligent algorithms to overcome the limitations of conventional methods, which is evident in applications such as security systems, robotic industrial automation, and clinical diagnostic assistance, where advanced data modeling and parallel learning provide superior performance [2].
In the field of signal processing, AI-based methods provide a key advantage due to their ability to automatically identify relevant features from complex data, enabling objective analysis, improved classification accuracy, and enhanced diagnostic precision. A central example in biomedical applications is electrocardiogram (ECG) analysis, which is widely used in the assessment of cardiac function [3]. Deep learning (DL) models can directly process raw ECG signals, autonomously extracting meaningful information and supporting applications such as intelligent cardiac monitoring [4], the estimation of physiological parameters such as blood pressure [5], or the automatic detection of arrhythmias [6].
Progress in the biomedical field has been significantly accelerated in recent years due to access to large-scale datasets [7,8,9], increased hardware performance, and the continuous evolution of artificial intelligence techniques [10]. These developments have enabled advanced applications that were previously difficult to implement. However, AI algorithms remain limited by their high computational resource requirements, which demands efficient hardware solutions for practical use. Therefore, architectures dedicated to biomedical processing aim to optimize energy consumption and reduce implementation complexity, without compromising prediction accuracy [3]. In this context, hardware acceleration represents a key direction for meeting the computational requirements of modern applications [2].
Considering these challenges, FPGA devices have become a promising option for implementing machine learning in biomedical systems. Their parallel processing capabilities, the flexibility provided by reconfigurability, and their favorable performance-to-power ratio make them well-suited for AI applications with high real-time requirements [11]. Moreover, advanced high-level synthesis (HLS) [12] technologies enable accelerated hardware implementation by directly translating sequential C/C++ code into register transfer level (RTL) descriptions and synthesizable FPGA designs, thereby broadening the adoption of FPGA devices beyond engineers, developers, and researchers with specialized HDL expertise [13,14,15].
Many recent studies have focused on reviewing the literature on energy-saving strategies for portable devices using FPGAs [16], while other works have compared various hardware platforms and technologies for implementing neural networks, highlighting the specific requirements of portable devices [1]. Additionally, some articles have analyzed the current state of AI algorithms and hardware platforms used in the biomedical processing of ECG, EMG, and EEG signals, with a focus on classifiers and performance [3]. Regarding the development of hardware accelerators implemented for optimizing neural networks on FPGAs, some recent studies have investigated the evolution of technologies, network models, and acceleration frameworks [17]. These research works have provided significant contributions concerning the use and development of FPGA technologies in the implementation of neural networks for medical diagnosis. Although these works address the latest AI network design and acceleration methods using FPGA technology, including in the biomedical field, comparative analyses focused specifically on new trends and implementations dedicated to real-time ECG-based diagnostic remain limited. In this context, this study presents a narrative review of recent systems for the assessment of cardiac pathologies that utilize various neural architectures implemented on different FPGA families, aiming to compare the impact of implementation, optimization, and acceleration techniques on performance.
The contributions of this paper are outlined as follows:
  • A review of recent ECG signal processing techniques on FPGA platforms, highlighting trends in filtering methods and feature extraction.
  • An overview of FPGA families, including a table summarizing relevant features for the development of AI algorithms in real-time medical diagnostic applications.
  • The analysis and comparison of the most recent machine learning applications implemented on FPGAs for ECG classification, highlighting the widespread use of the MIT-BIH database and the preference for SVM and ANN architectures.
  • The evaluation of the most recent deep learning networks implemented on FPGA platforms for cardiac anomaly detection, emphasizing the preference for using the Zynq-7000 family for hardware implementation.
  • The comparison of FPGA accelerators dedicated to deep neural networks used in ECG classification, considering hardware performance (power, latency, frequency, and accuracy). The analysis of optimization strategies highlights the frequent use of pipelining, quantization, dataflow optimization, pruning, loop unrolling, and PE arrays, with 1D-CNN accelerators representing the dominant solution.
This article is structured into seven sections and provides a comprehensive analysis of the new trends in implementing AI algorithms on FPGA platforms for ECG signal processing. Section 2 presents the review methodology, based on the databases used and the selection of articles from the literature. Section 3 highlights recent ECG signal processing methods on FPGAs, emphasizing filtering and feature extraction techniques. Section 4 describes FPGA technology, followed by a comprehensive analysis of current FPGA families, highlighting their advantages for implementing neural networks. Section 5 provides a comparative analysis of recent machine learning and deep learning models and techniques applied to ECG classification, along with their FPGA implementations. In addition, the paper investigates the ways networks are integrated into FPGA platforms, presenting a comparative evaluation based on classification performance and the hardware technology used. Section 6 includes a review of current FPGA accelerators dedicated to implementing deep learning networks for ECG classification, along with a critical analysis of optimization techniques and achieved performance. Finally, Section 7 presents a summary of the paper, along with the main findings and suggestions for future research. Figure 1 illustrates the structure of the article.

2. Review Methodology

The aim of this paper is to provide a comprehensive analysis of the current state of techniques used for implementing intelligent algorithms on FPGA programmable circuits, aimed at ECG signal classification. It also seeks to identify the most efficient techniques and algorithms, the hardware challenges, as well as the optimization opportunities. The proposed methodology includes a systematic process of searching, selecting, and analyzing the specialized literature. Thus, the included studies are recent and relevant to current research trends in the hardware acceleration of ECG signal classification. The search process involved using the main scientific databases: IEEE Xplore, Google Scholar, ScienceDirect, and SpringerLink. Multiple databases were considered to ensure that all relevant articles were selected. Keywords, as well as combinations of them, were used, such as: “ECG signal classification”, “FPGA implementation”, “intelligent algorithms”, “machine learning on FPGA”, “hardware acceleration for ECG” and “FPGA accelerator”. To ensure technological consistency, the term “FPGA” was explicitly included in the search string, given that similar applications are often implemented using alternative hardware platforms (microcontrollers, microprocessors, and ASICs). Additionally, “ECG” was applied as a mandatory filter to limit the analysis exclusively to ECG-related studies. To search for a specific algorithm, its name was used as a keyword, for example, “Convolutional Neural Network”. The search was limited to works published between 2017 and 2025 to cover recent research in the field. Articles from scientific journals, conference papers, as well as studies presenting hardware implementations were included. Duplicate articles were also removed.
Subsequently, the article selection stage involved analyzing the title and abstract to identify potentially relevant works.
The selected articles were fully evaluated using inclusion criteria such as: ECG signal classification must be performed with intelligent algorithms, and the described algorithms must be implemented on FPGA. The choice of database was not an inclusion criterion for this review. Consequently, this study does not evaluate model generalizability, dataset bias, or fairness in comparison. Articles were considered irrelevant if they did not include ML/DL implementations on FPGA or if the algorithms were not intended for ECG classification. Following the selection process, after removing irrelevant articles, 39 articles were selected and included in this study.
For each included study, details were extracted regarding the type of intelligent algorithm, methodology, classifier performance, the FPGA platform used, and the performance of the FPGA accelerator. The steps of the search, selection, and inclusion process of the articles in this study are presented in Figure 2.

3. Existing Implementations of ECG Signal Processing

Cardiovascular diseases (CVD) are the leading cause of mortality worldwide, accounting for approximately 20.5 million deaths in 2025. Based on the most recent studies, between 2025 and 2050, the prevalence of cardiovascular conditions is expected to increase by 90%, and sudden mortality by 73.4%, reaching approximately 35.6 million deaths by 2050 [18]. The electrocardiogram (ECG) is the primary technique for analysis and diagnosis in the clinical field, as it is a non-invasive, accessible, and efficient tool for recording the heart’s electrical activity. However, in cardiology, ECG signal interpretation is traditionally performed by specialists through visual inspection of the signal morphology. Considering that a 24 h monitoring session can contain around 100,000 heartbeats, visual analysis becomes impractical and inefficient for long-duration signals. In this context, the development of efficient continuous monitoring and automatic diagnostic techniques is necessary, capable of supporting and assisting physicians in the interpretation process, optimizing diagnosis through early anomaly detection, and improving the accuracy of clinical decisions.
Currently, ECG monitoring and cardiovascular diagnostic systems increasingly rely on the development of intelligent tools that use artificial intelligence algorithms, as well as on extending software approaches to hardware implementations, with the aim of reducing computational cost, energy consumption, and prototype size. Furthermore, current trends in the cardiovascular industry are converging toward on-device implementation of algorithms, as this allows ECG signal processing directly on the device, eliminates dependence on an internet connection, ensures a high level of data privacy, and reduces the risk of response delays [19]. Thus, on-device implementations represent more stable and reliable solutions for continuous patient monitoring. Additionally, ECG analysis solutions based on portable devices [20], edge-deployable systems [21], or offline analysis platforms using general-purpose processors or microcontrollers [22] exhibit major limitations in terms of energy efficiency, latency, resources, and processing power. Therefore, hardware implementation based on reconfigurable computing using field-programmable gate arrays (FPGAs) represents a promising solution. This technology has been one of the most extensively investigated over the past decade in the field of cardiovascular diagnostics and has proven to be one of the most efficient solutions in modern research dedicated to ECG signal monitoring and analysis.

FPGA-Based ECG Signal Processing for Real-Time Monitoring

As new trends in ECG signal analysis increasingly focus on preventive healthcare and real-time detection, along with the expansion of wearable devices and remote monitoring, continuous real-time ECG signal processing becomes an essential component. To manage the large data flow and energy constraints specific to portable systems, FPGAs provide an efficient solution due to their real-time hardware reconfigurability, parallel processing capabilities, low latency, and low energy consumption. Thus, fixed analog front-end stages, such as analog hardware filters or traditional preprocessing methods, can be replaced, facilitating system adaptation to spontaneous, variable, and transient conditions [23]. Furthermore, FPGA platforms allow the acceleration of ECG signal processing applications, providing advantages in terms of data throughput and propagation time. On the other hand, FPGAs represent a balanced solution between performance and energy consumption, as they offer greater flexibility than ASICs and higher energy efficiency than GPUs [24]. Additionally, through heterogeneous computing architectures, which integrate DSP blocks, ARM processors, and memory modules, a robust hardware-software processing mechanism is facilitated.
Continuous ECG signal processing includes several main stages, such as signal preprocessing, feature extraction and optimization, data standardization, and decision-making processing. In the specialized literature, all these preprocessing stages present a range of advanced FPGA-based solutions aimed at efficient ECG signal filtering, feature extraction, the use of relevant feature selection techniques, or the implementation of decision-making algorithms. However, the main challenges in real-time ECG signal processing on FPGA platforms, such as managing energy consumption, latency, memory requirements, hardware complexity, and ensuring accuracy, still require optimized solutions to achieve precise and clinically validated results.
In the field of filtering, the most recent techniques focus on highly efficient methods that offer an optimal trade-off between hardware performance and signal quality in real-time processing. Current filtering methods range from optimized FIR filters, capable of achieving up to 1250× acceleration compared to software implementations [25] and reducing dynamic power for portable devices, to hybrid strategies that reach an SNR of approximately 30 dB [26]. Additionally, adaptive architectures play an important role, as it has been demonstrated that using systolic structures can reduce energy consumption by up to 40% [27]. Moreover, filters based on the discrete wavelet transform (DWT) and approximate pruning techniques have been notable for significantly lowering power consumption and improving arrhythmia detection, achieving up to 99.65% accuracy on FPGA platforms [28]. From the perspective of memory optimization for real-time ECG signal processing, article [29] presents a hybrid architecture combining the DWT and adaptive dual threshold filter (ADTF), capable of efficiently managing the data flow. This approach successfully reduced the usage of BRAM and registers while maintaining high performance in noise removal.
From the perspective of recent methods for feature extraction from ECG signals, the specialized literature highlights a variety of techniques, ranging from classical algorithms to modern methods based on mathematical transforms and complex functions. These are often optimized to achieve high performance in terms of detection accuracy and hardware implementation efficiency. Thus, classical methods, such as the Pan-Tompkins algorithm [30,31], focus on extracting temporal features by analyzing peaks, interval durations, and specific segments of ECG signals. Optimizing these methods through the use of fixed-point arithmetic, pipelining, and parallelization has led to high hardware performance with minimal FPGA resource utilization. Other studies have applied the DWT to extract features directly from the coefficients, and the use of techniques such as parallelism, pipelining, and processing core reuse has resulted in a significant reduction in power consumption and memory usage on FPGAs [32]. Furthermore, methods based on R-peak detection and heart rate calculation using DWT lifting strategies, considering only level-4 coefficients, have enabled a reduction in hardware requirements and improved detection accuracy, achieving up to 99.43% [33]. In a related study, the authors used DWT together with the fiducial windowing technique to efficiently extract clinically relevant features from the ECG signal, demonstrating that the hardware implementation of the method, combined with FPGA-based acceleration of the classification system, provides a high-performance solution for real-time processing and detection of cardiac arrhythmias [34]. Beyond the widespread use of the DWT, some studies have focused on implementing feature extraction strategies aimed at significantly reducing analysis complexity by compressing the ECG signal into coefficients that are easy to process in hardware. Thus, study [35] presents a method based on Hermite functions aimed at extracting a complete set of coefficients that describe the morphology of the ECG signal. The reduced number of parameters directly influenced the hardware implementation on FPGA, achieving a latency of under 1 s and a minimal power consumption of 28 mW. A different approach is presented in study [36], where the authors extracted advanced features from Hurst images derived from ECG signals, including texture features (Hu moments and Haralick texture descriptors), geometric and contour features, as well as global features such as entropy, skewness, and kurtosis. System optimization was achieved by task partitioning: global features are computed on the CPU, while geometric, contour, and texture features are accelerated on the FPGA. Thus, the CPU handles sequential and statistical operations, while the FPGA platform processes pixel-level operations in parallel. This computing strategy achieved a maximum accuracy of 97.3% with minimal latency and optimal processing speed.
Therefore, the methods described in this section highlight the central role of FPGA platforms in ECG signal processing for the development of efficient real-time clinical analysis systems. Furthermore, recent approaches demonstrate that the use of optimized techniques in the biomedical signal analysis stage can provide a good balance between detection performance and the hardware efficiency of FPGAs.

4. Fundamentals of FPGA Technology

The hardware development of intelligent real-time ECG analysis systems requires efficient management of resources, power consumption, and latency. Due to their specific properties, FPGAs offer unique solutions for optimizing artificial intelligence architectures, enabling the implementation of precise and efficient diagnostic techniques. Despite this, choosing the appropriate FPGA architecture and family is essential to meet the requirements imposed by the application or analysis system. Therefore, this section presents the general FPGA architecture, followed by an analysis of modern FPGA families and their implementation technologies, intended to support the development of applications for automatic classification.

4.1. FPGA Architecture

Field-Programmable Gate Arrays (FPGAs) are reconfigurable semiconductor circuits designed to implement various digital architectures optimized through direct hardware-level programming. Unlike traditional processors based on a sequential architecture, FPGAs allow parallel execution of operations, which leads to reduced latency and high efficiency in time-critical applications. Due to their functional versatility, these devices are increasingly used in applications such as biomedical signal processing and real-time automatic diagnosis [11].
At the architectural level, modern FPGAs SoCs are composed of a Programmable Logic (PL) block, which is the traditional FPGA fabric, a Processing System (PS) and a network on chip (NoC) IP facilitating the interconnection between PS and PL, a general illustration of which is shown in Figure 3. Taking a closer look into the FPGA fabric, one can find several hardware resources organized in a matrix structure. Configurable logic blocks (CLBs) are arranged in a two-dimensional grid and interconnected through a flexible routing system, allowing connection adaptation depending on the implemented application. Block RAMs (BRAMs) allowing the implementation of on board memories/FIFOs; Digital clock managers that can generate internal clock signals with configurable frequencies; digital signal processing (DSP) cores facilitating arithmetic/floating point operations. This organization provides FPGAs with a high degree of reconfigurability and versatility in the development of digital systems. At the edges of the structure are the input/output blocks, which use the same programmable routing resources to facilitate communication with the internal elements of the device.
By distributing data and computation between DSPs and LUTs, an optimal trade-off between latency and energy consumption is achieved, suitable for high-performance systems or AI applications on FPGAs. Additionally, the integrated BRAM reduces the need for external memory access and contributes to lowering overall latency. This structure enables the development of high-performance digital systems with low response times and efficient execution of parallel tasks [37]. Additionally, the concurrent processing capabilities and flexible reconfigurability make FPGAs an optimal platform for computationally intensive applications [38].

4.2. Overview of Modern FPGA Families

The hardware design of AI architectures for real-time medical diagnosis requires efficient management of performance parameters, such as diagnostic accuracy, power consumption, and data transfer latency.
The proper selection of an FPGA family is a critical step in the design of diagnostic systems, as implementation technology varies between platforms in terms of integrated resources and support for hardware acceleration. Therefore, this section explores the main FPGA families used in computationally intensive applications within clinical diagnostics. Each FPGA generation offers distinct features regarding the design of automated systems. Thus, both common platforms and modern solutions optimized for implementing machine learning and deep learning algorithms are analyzed, aiming to highlight how FPGA technology can efficiently support inference in the context of medical applications.
Modern SoC FPGA platforms, such as the Xilinx Zynq series, incorporate ARM cores, creating a hybrid environment that combines the advantages of hardware acceleration with the flexibility of software execution. This heterogeneous architecture facilitates optimal task allocation between the processor and programmable logic, contributing to reduced latency and increased energy efficiency in deep neural network (DNN) designs [39].
The Zynq-7000 family is the first extensible processing platform that appeared on the market in 2011 [40]. It integrates a processing system (PS) based on a dual-core ARM Cortex-A9, with advanced microcontroller bus architecture (AMBA) interconnects and other specific functionalities, being dedicated to high-level embedded applications. Therefore, due to the flexibility it offers by combining software processing with FPGA configurable programmability, this family enables the development of complex applications with high computational power, hardware acceleration, and real-time control [41].
The Zynq UltraScale+ MPSoC family [42] represents the first heterogeneous multiprocessor with advanced 16 nm technology, introduced in late 2015. This family integrates two ARM Cortex-R5 real-time processing units (intended to be used for deterministic, low-latency, real-time task, such as motor control, automation, power electronics), a dual-core or quad-core 64-bit ARM Cortex-A53 application processing unit (a well fit for operating systems, such as FreeRToS, or other Linux distributions), and an ARM Mali-400 graphic accelerator [43]. It also includes a dedicated AI/ML unit optimized for CNNs, as well as an advanced memory hierarchy, facilitating the implementation of scalable and efficient embedded systems [44]. Furthermore, this platform offers high system-level performance up to five times greater, maximum scalability, advanced security, full connectivity, and efficient power management.
The latest SoC FPGA platform developed by Xilinx is the Versal adaptive compute acceleration platform (ACAP), introduced in 2020 [45]. It incorporates a core based on a heterogeneous computing architecture with an AI engine (AIE), together with two structures: the processing system (PS) and the programmable logic (PL). Therefore, the platform manages to combine three major processing tasks into a single architecture by integrating high-performance scalar and vector processing elements (CPU, DSP, and GPU), which are tightly coupled through advanced programmable logic and interconnected using a high-bandwidth on-chip network (NoC). Thus, the Versal ACAP family provides an efficient balance between GPU performance and FPGA flexibility, allowing superior hardware customization compared to other platforms, and, thanks to the AIE core, enables a much more optimized design than that of ASICs [46].
The ACAP family includes a series of six Versal platforms: AI Core, AI Edge, AI RF, Prime, Premium, and HBM [45]. Among these series, AI Core and AI Edge are dedicated to applications in the field of medical diagnostics. The Versal AI Core [47] offers exceptional computing power, with high performance for network inference and acceleration, while also meeting the computational density requirements for CNNs. By integrating three scalar processors, the platform is dedicated to complex, real-time applications, such as the processing and analysis of biomedical images for faster and more accurate diagnosis, or the interpretation of data from wearable continuous monitoring devices. On the other hand, for advanced automatic diagnostic applications, the AI Edge series [48] provides AI inference with minimal latency, efficiently accelerating machine learning algorithms and complex image processing techniques for real-time monitoring.
Standalone platforms are represented by FPGA families without an integrated processor, such as Altera (Intel) or Xilinx (AMD), which include a series of dedicated families.
Xilinx offers the largest number of platforms in terms of manufacturing technology, with various technology nodes such as 45 nm, 28 nm, 20 nm, or 16 nm [49]. Some modern examples include the Artix UltraScale+, Virtex UltraScale+, and Kintex UltraScale+ series, manufactured using 16 nm technology.
The Artix UltraScale+ [50] offers high DSP performance and is the only family in the industry that uses an advanced technology called integrated fan-out (InFO), which ensures superior processing density, improved signal propagation, and excellent energy efficiency. Additionally, the Artix UltraScale+ is considered one of the most efficient solutions in the low-power, cost-effective segment.
The Virtex UltraScale+ [51] is positioned as the family with the most advanced signal processing performance, offering the highest I/O bandwidth and flexible clocking technology. This platform utilizes a 3D architecture with a monolithic design, making it ideal for computationally intensive applications in the field of ML. Moreover, the Virtex UltraScale+ is a scalable and highly powerful platform for accelerating complex tasks.
In contrast, the Kintex UltraScale+ [52] is considered an intermediate solution, positioned between the performance levels of the Artix and Virtex families. This family offers an optimal balance between cost, performance, and power consumption, making it ideal for high-demand applications, DSP-intensive operations, and even network acceleration.
Also in the mid-range category are the Artix-7 and Spartan-7 families. These two architectures use a 28 nm manufacturing process and offer a good balance between performance and energy efficiency. The Artix-7 [53] provides a significant reduction in power consumption, up to 50% compared to the previous generation. Furthermore, due to its compact design, high performance, and integration with the MicroBlaze processor, the Artix-7 family meets size, cost, and power requirements, making it ideal for portable medical systems. The Spartan-7 family [54], introduced in 2017, is designed to achieve an optimal balance between cost, performance, and power consumption, targeting applications with moderate performance requirements.
Also in the category of standalone platforms are some FPGA families developed by Altera (Intel), such as Cyclone, Arria, and Stratix.
From the Intel Cyclone series, the most recent families are the Cyclone V and Cyclone 10 FPGAs. The Cyclone V [55], available since 2012, is oriented toward edge computing applications, as it efficiently integrates programmable logic and offers both architectures with integrated transceivers as well as SoC FPGA platforms that include an ARM processor. In contrast, the Cyclone 10 family [56], launched in 2017, provides excellent performance, twice that of the V series, because it integrates the Quartus design environment and various support tools. Additionally, this technology emphasizes energy efficiency, being dedicated to applications sensitive to both computational cost and power consumption, making it suitable for portable medical monitoring devices.
The Intel Arria 10 family [57] includes both traditional devices and SoC platforms that incorporate a dual-core ARM Cortex-A9 processor. The architecture is built using 20 nm technology and is positioned in the market as mid-range. This family offers a higher speed class than previous generations, with up to a 20% increase in maximum operating frequency and up to a 40% reduction in power consumption. In the medical field, the performance of Arria 10 platforms stands out in real-time diagnostic applications based on imaging and scanning.
A recent platform from Intel’s generations is the Intel Stratix 10 family, available in both FPGA and SoC versions [58], designed using 14 nm technology. It offers superior performance compared to previous generations by doubling the core clock frequency, thanks to the HyperFlex architecture that integrates advanced technologies. Therefore, compared to the Cyclone 10 and Arria 10, the Stratix 10 provides superior advantages in terms of performance, energy efficiency, and system integration.
The Intel Agilex SoC family, launched in 2019, consists of several recently developed FPGA series. This platform is designed using 10 nm technology in a heterogeneous 3D system-in-package (SiP) approach. The HyperFlex architecture and the quad-core ARM Cortex-A53 processor ensure a 45% increase in performance and a 40% reduction in power consumption compared to the Stratix 10 [49]. The Agilex-9 SoC [59] is the newest FPGA platform in the Intel Agilex family, available since 2024. It provides high computational capacity in a compact form factor while maintaining low energy consumption, making it suitable for edge computing applications, advanced signal processing, and neural network inference.
Table 1 provides a summary of the FPGA families analyzed, highlighting their characteristics in terms of integrating artificial intelligence algorithms into real-time medical diagnostic applications. For a comparative overview, the following parameters were considered: device type, manufacturing technology, type of ARM processor, dedicated AI engine, and specific features of each platform. Over the years, fabrication technology has advanced significantly, reaching sizes of 14 nm, 10 nm, and even 7 nm for FPGAs. Along with this evolution, FPGA devices have experienced substantial progress in supporting the implementation of AI algorithms. Modern FPGA families enable the deployment of ML and DL networks using programmable logic through HLS and RTL design approaches. Furthermore, high-performance heterogeneous platforms, such as the Versal AI Core and Versal AI Engine, integrate network on chip fast interconnects and computational resources specifically designed for AI inference tasks. As a result, these technological improvements, combined with the specific features of FPGA families, strongly support the development of automated techniques for ECG signal analysis and classification.
Regarding the ECG classification model, the model influences the selection of the hardware architecture on FPGA platforms. The choice of the model determines aspects such as power consumption, used memory, and latency. Thus, classical classifiers, which stand out for the reduced complexity of the internal architecture, such as support vector machines (SVMs) or artificial neural networks (ANNs), are suitable for hardware implementations on FPGAs with limited resources and strict constraints on latency and power efficiency (Artix-7, Spartan-7, Cyclone-V, Cyclone-10). On the other hand, deep learning-based models, such as convolutional neural networks (CNNs), require massive parallelism to handle complex computational operations, as well as multiple computing resources, memory, and DSP units. Therefore, these aspects make them more suitable for implementation on advanced FPGA (Artix UltraScale+, Stratix-10, Versal AI Edge, Versal AI Core) or SoC platforms (Zynq-7000, Zynq UltraScale+, Agilex-9), where an optimal balance between model performance and energy efficiency can be achieved.

5. Integrating AI Techniques into FPGA-Based ECG Processing

Leveraging artificial intelligence techniques for ECG signal processing on FPGA platforms offers significant potential to improve diagnostic accuracy and real-time performance. This chapter presents studies on adapting and optimizing both traditional machine learning algorithms and deep neural networks models to enable reliable and efficient cardiac monitoring in practical, real-world applications.

5.1. Application of Machine Learning Based on FPGA for ECG Classification

Recent advancements in AI algorithms have transformed the diagnosis of heart diseases, leading to the development of specialized decision-support systems for complex clinical evaluations based on ECG signals. Owing to their ability to perform precise predictions with accuracy superior to manual diagnostic methods, machine learning models have outperformed human specialists in terms of diagnostic accuracy using biomedical data. However, deploying these algorithms in systems dedicated to real-time analysis introduces challenges related to processing latency, computational requirements, and energy efficiency. Consequently, there is an emerging need to accelerate ML computational units. Among the most representative approaches are artificial neural networks (ANNs) and support vector machine (SVM) classifiers. Accordingly, this section explores various optimized strategies for implementing these classifiers on FPGA platforms, focusing on minimizing resource usage while maintaining high accuracy.

5.1.1. Support Vector Machines (SVMs)

Support Vector Machine (SVM) [60] is one of the most important algorithms in the field of machine learning, as it is valued for its high performance in classification and prediction tasks. Its strong capability to handle both linear and nonlinear problems through kernel functions allows the classifier to be used in various applications, such as text and image recognition, bioinformatics, and even in the field of medical diagnostics [61]. On the other hand, this algorithm involves computationally intensive operations and requires optimization of the trade-off between classification performance and computational cost. In this context, in recent years, various SVM-based classification techniques have been proposed to improve accuracy and implementation efficiency on FPGAs. To highlight current trends, this section presents specialized publications that have addressed different methods for hardware implementation and optimization of SVMs for ECG signal analysis and classification applications.
In [62], the authors present a method for classifying cardiac arrhythmias based on the SVM algorithm. The model is implemented on a Zedboard Zynq FPGA and involves two distinct stages for hardware acceleration: optimizing the behavioral design to facilitate parallelization of data processing and configuring the accelerator’s memory architecture to align how the SVM accesses and processes data. The developed architecture achieved a latency improvement of up to 98.78%.
Another recent study [36] proposes a hybrid CPU-FPGA architecture aimed at optimizing the feature extraction process from ECG signal morphology and real-time arrhythmia classification using a hardware-accelerated SVM model on an FPGA. The system applies task sharing between the CPU and FPGA, hardware-level parallel processing, and a series of structural optimizations. Implementation on the Zybo Z7-20 Zynq-7000 board achieved a maximum accuracy level of 97.3%.
The study [63] describes a method for optimizing the SVM algorithm for arrhythmia classification using ECG signals. The authors propose two algorithmic approximation techniques, based on precision scaling and loop perforation, combined with various integrated HLS optimizations. The detection system was implemented on the Zynq ZC706 platform, and the reported results demonstrate an accuracy of 96.7% and a 15× acceleration.
Xiaochen et al. [64] developed a system for ECG signal monitoring and classification, which includes an integrated front-end circuit for measuring signal slope variations, a delineation algorithm implemented on a Spartan-6 FPGA board to identify fiducial points, and an SVM algorithm for classifying arrhythmia types. The proposed method achieved an accuracy of 95.1%, while also providing an improvement in energy efficiency.
An efficient approach for ECG signal classification is illustrated in [65]. This paper proposes an optimized system that enables real-time processing through parallelization and pipelining techniques. The architecture consists of an adaptive SE-NLMS algorithm and an unfolding technique. The SVM algorithm serves as the classifier for ECG signals, and experimental results showed a maximum accuracy of 96.5%. Implemented on a Spartan-6 FPGA, the system achieved a power consumption reduction of 28.9% and a 12.59% decrease in LUT usage compared to the LMS-based method.
In [66], an ECG classification technique is developed that uses an SVM algorithm in combination with an acceleration module based on the DWT to efficiently extract features. The optimized classification architecture achieved a power consumption of 2.059 W, a classification time of 280 µs, and a maximum accuracy of 98.7%.
The study [67] presents an optimized method for ECG signal processing and classification. The system integrates a filtering architecture based on distributed arithmetic (DA), a QRS complex feature extraction module, and an SVM classifier. The combination of robust feature detection and the hardware-optimized implementation of the SVM on the Zynq-7000 platform enabled a reduction in resource usage and overall power consumption.

5.1.2. Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) [68] are fundamental models in machine learning, valued for their massive parallel processing, which ensures fast execution and high performance. They are applied in a variety of tasks such as classification, prediction across multiple domains, and pattern recognition, demonstrating notable results in image recognition as well as in natural language processing [69]. The recent significant interest in ANNs is closely related to major advances in cardiovascular diagnostics, as well as to new developments in hardware implementations. Although most implementations are software-based due to their simplicity, hardware versions have demonstrated superior performance and a high level of flexibility, even though they involve much more complex tasks during the design phase. These directions have led to the development of systems dedicated to real-time processing and analysis of ECG signals, as can be observed in the following scientific studies.
In [70], the authors proposed an ANN architecture trained on raw ECG signals, without morphological or spectral preprocessing. The data were extracted from the MIT-BIH database, and each heartbeat was represented by a total of 187 samples. During the training phase, the model demonstrated an accuracy of 97%. In the final testing phase, the optimized feed-forward network implemented on a Xilinx Zybo FPGA achieved a latency of only 232 clock cycles (1.856 μs).
Vinaykumar et al. [71] implemented a hardware ANN classifier for detecting four types of arrhythmic beats, using the MIT-BIH Arrhythmia database. Feature extraction via DWT, as well as network training, were performed in the MATLAB environment. Subsequently, the model was described in Verilog and integrated on the NEXYS 4 DDR board. The software results achieved an accuracy of 85.3%, while the FPGA implementation reached 85.6%.
Srivastava et al. [72] developed a method for classifying eight arrhythmia classes using a probabilistic neural network (PNN) and a set of six features: heart rate, fourth-order autoregressive coefficients, and spectral entropy. The classifier was implemented on a Xilinx Artix-7 FPGA board, and hardware processing was carried out using the Xilinx system generator environment. Evaluation on the test set, using the MIT-BIH database, required 17 s of execution time and achieved an average accuracy of 98.27%.
The study [73] describes an implementation of an ANN for arrhythmia classification, tested in MATLAB and developed on the Nexys 4 FPGA board for real-time inference. Simplifying the architecture by reducing the number of essential features allowed optimization of the hardware logic while maintaining a high level of accuracy. Evaluation of the method on the MIT-BIH database showed a maximum classification performance of 95%, with low resource consumption and a maximum operating frequency of 98.209 MHz.
A recent study by Jashwin et al. [74] proposes a technique for classifying five arrhythmia classes using the MIT-BIH database. The work focuses on developing a robust ANN model, optimized through 16-bit fixed-point representation and by reducing the number of features extracted from the ECG signal using DWT. The novelty of the method lies in signal compression and the use of a minimized architecture, implemented on the Zynq UltraScale+ ZCU104 platform. The results showed an accuracy of 94.08%, a power consumption of 0.118 W, and the use of only 57 LUTs.
Another arrhythmia classification technique is presented in [75], where the authors used a bi-reciprocal lattice wave digital filter (BLWDF) for QRS complex detection and a two-layer ANN to develop a compact and efficient classifier for low-cost FPGA platforms. The system, which classifies four arrhythmia classes, was implemented on the Spartan-6 platform, achieving an accuracy of 81.9% on the test set and a maximum processing frequency of 255.527 MHz.
Sharada et al. [76] implemented a hybrid MLP-PNN method for arrhythmia classification, developed on the Spartan-3 platform. For optimization, the authors used 24-bit fixed-point representation and applied the PCA method for feature set reduction. The system’s innovation lies in the optimization of the tanh activation function through a programmable logic array (PLA) implementation, which allowed a reduction in hardware complexity and latency while maintaining a maximum accuracy of 99.82%.
The Sankey diagram in Figure 4 illustrates the relationships among four indicators used to compare the 14 literature studies analyzed regarding recent ECG classification systems based on hardware-implemented ANN and SVM algorithms. The analysis considered the FPGA family used, the classifier, the database employed, and the type of classification. The data are arranged from left to right, with each group representing a different stage of analysis. The “FPGA Family” indicator generates 8 connections, the “Classification Algorithm” creates 7 connections, while both “ECG Database” and “Classification Type” form 5 connections each. Each parameter is divided into categories, and the values in parentheses indicate the total number of articles associated with each indicator. Additionally, each article is analyzed across all four groups.
From the perspective of hardware implementation, it is notable that most studies using the SVM algorithm opted for the ZYNQ-7000 family. In contrast, ANNs show a much more diverse distribution in terms of FPGA families, including Spartan-6, followed by Artix-7, Spartan-3, and UltraScale+. Regarding databases, the MIT-BIH Arrhythmia dataset is the most frequently used.
Based on the previously presented articles, summarized in Table 2, a comparative analysis of the 14 selected studies was conducted. The table integrates key elements for evaluating existing contributions in the field, such as the type of intelligent algorithms, the FPGA platform used for their implementation, the databases employed for testing, and the performance achieved in the classification process.
The systematic review of the studies highlights a balanced distribution of machine learning techniques: 50% of the studies use SVM classifiers [36,62,63,64,65,66,67], while the other half adopt ANN [70,71,72,73,74,75,76]. This aspect emphasizes the comparable interest of the scientific community in both classification paradigms dedicated to hardware-implemented ECG analysis.
SVM-based solutions achieve clinically relevant accuracy above 95.00%. In contrast, ANN models show variable results: some optimized architectures reach high performance, exceeding 94.00% [70,72,73,74,76], while others, with smaller-scale designs, are less precise, limited to accuracy values of 81.90% [75] and 85.60% [71], respectively.
Regarding hardware implementation, the FPGA families used in the reviewed studies are distributed into three main categories: low-cost FPGAs (Spartan-6, Spartan-3, and Artix-7), SoC platforms (Zynq-7000), and high-performance FPGAs (Zynq UltraScale+). Small-scale SVM and ANN models integrate efficiently on low-cost FPGAs, demonstrating, in some cases, remarkable performance, with accuracy exceeding 98.00% [72,76]. However, these devices become limiting for complex architectures, where the reported accuracy drops below values considered optimal for clinical applications [71,75].
Furthermore, the analysis of the included articles shows a balanced distribution between low-cost FPGA solutions and SoC platforms, with 50% of the analyzed implementations in each category. However, a clear trend toward Zynq-7000 platforms is observed, especially in recent studies [36,66,70,74], due to their balance between software and hardware. In most of these implementations, accuracies above 94.00% are achieved, and the predominant classifier remains SVM because of its robustness and low computational cost.
Analyzing the data presented in Table 2, it is noticeable that most studies used the MIT-BIH Arrhythmia database. This public database represents the first standardized dataset for cardiac abnormalities, with annotations provided by domain experts.

5.2. Application of Deep Learning Based on FPGA for ECG Classification

In recent years, FPGAs have become promising solutions for implementing deep neural networks for medical diagnosis based on ECG signals. They can be configured to support architectures with low power consumption, improved latency, and high accuracy. Moreover, high-performance FPGAs offer extensive parallel processing capabilities, while integrated versions can provide superior energy efficiency. Consequently, the design of ECG hardware solutions that achieve a good trade-off between compact size and strong accuracy performance has become a major research focus. In this section, the performance of various ECG classification systems based on DL models, such as DNNs, RNNs, LSTMs, and CNNs are analyzed. The objective is to identify optimized models that improve hardware efficiency while maintaining high detection accuracy and satisfying real-time inference requirements.

5.2.1. Deep Neural Networks (DNNs)

Deep neural networks (DNNs) [77] currently form the foundation of a significant number of artificial intelligence applications, due to their ability to automatically learn high-level features from raw datasets by applying statistical learning methods to large volumes of information. Compared to traditional classifiers, which rely on manually defined rules or expert-selected features, DNN models provide a fully automated analysis and processing pipeline. Their applications span multiple domains, including speech recognition, image analysis, and medical diagnostics, where they often surpass human expert performance [78]. They are also applied in machine translation and the automotive industry. An example of a deep neural network used in cardiovascular diagnostics, specifically a multi-layer perceptron (MLP), is presented below.
Chen et al. [79] proposed the AutoMLP framework for accelerating MLP networks on FPGA, targeting real-time detection of atrial fibrillation. The framework was evaluated using the MIT-BIH atrial fibrillation and China Physiological Signal Challenge 2018 (CPSC2018) datasets, producing four hardware accelerator models. For CPSC2018, all models achieved accuracies above 97%, while for MIT-BIH only two accelerators were tested, obtaining accuracies above 93%. Implemented on the PYNQ-Z2 platform, AutoMLP allows customization of bit-widths and data types. The MLP accelerator achieved over 1500× faster processing and approximately 25000× lower energy consumption compared to an embedded CPU, completing atrial fibrillation classification in just 0.2 µs.

5.2.2. Recurrent Neural Networks (RNNs)

Recurrent Neural Network (RNN) [80] is a model specifically designed for sequential data processing. It is characterized by a relatively simple architecture that enables fast training through modifications of LUTs. The main advantage of RNNs lies in their ability to model sequences and maintain an internal memory state, allowing them to process inputs in a temporal or sequential manner. Such neural network architectures have proven effective in the analysis and classification of ECG signals, as highlighted in the following specialized studies.
The study presented in [81] describes the bidirectional recurrent chimp search (Bi-RCS) system, which efficiently integrates on an FPGA platform an adaptive ECG signal filtering algorithm, a Bi-RNN for feature extraction and classification, and a chimp optimization algorithm aimed at tuning the hyperparameters of the neural network. Evaluation of the system on the FPGA demonstrated a maximum accuracy of 99%.
In [82], the authors propose a VLSI architecture aimed at the efficient real-time processing of ECG, EEG, and EMG signals. The classification system is compact, consisting of a FIR filter designed using distributed arithmetic and an RNN. Optimization of the proposed solution is investigated through dedicated techniques such as clock gating, dynamic voltage scaling, voltage gating, approximate computing, and power-aware memory management. Compared to a CNN architecture, the hybrid RNN-FIR network achieved a significant energy consumption reduction of approximately 41.9%, an SNR improvement of 10.4%, and a 33% faster operating time. Implementation of the model on a Zynq-7000 FPGA demonstrated its ability to efficiently support real-time biomedical signal processing, due to low energy consumption and superior performance.

5.2.3. Long Short-Term Memory Networks (LSTMs)

Long Short-Term Memory (LSTM) networks [83] consist of specialized modules called memory blocks, which are connected recurrently and designed to retain information over time. These blocks use nonlinear gates to manage how data is stored, updated, and transmitted forward. LSTM is a high-performance deep learning technique applied in text recognition, time series prediction, natural language processing, computer vision, video captioning, sentiment analysis, and the medical field, due to its efficiency in processing sequential data [84]. In the context of hardware design on FPGA boards, LSTMs benefit from various advantages, such as reduced memory usage and decreased computational complexity through architectural compression and operation parallelization [85]. These features are crucial for portable devices intended for real-time monitoring and classification of biomedical signals. Following this, a set of recent studies is analyzed, addressing both current techniques for optimizing and accelerating LSTM networks on FPGAs, as well as their practical applications in ECG signal classification.
The study presented in [86] illustrates the implementation of a pipelined stochastic adaptive distributed architecture (P-SCADA), a hybrid structure optimized for LSTMs that combines stochastic arithmetic with binary circuits. This design provides an optimal balance between performance and energy consumption. When implemented on an Artix-7 FPGA and tested on multiple subsets of the MIT-BIH Arrhythmia database, the architecture achieved resource utilization of 40–44%, energy consumption between 20 and 25%, and an accuracy of 98%.
In [87], Akhtar et al. proposed an optimized VLSI design for real-time ECG signal classification. The architecture integrates a feature extraction module using a four-level Daubechies wavelet transform, along with a classification module composed of an LSTM network, fully connected layers, and a multi-layer perceptron. The model was tested on signals from the MIT-BIH database and achieved a maximum accuracy of 99%. To evaluate real-time processing capability, the method was implemented on an Artix-7 FPGA, achieving an energy consumption of 41 mW at a frequency of 54 MHz.

5.2.4. Convolutional Neural Networks (CNNs)

Convolutional Neural Network (CNN) [88] is a deep learning neural network widely used for both descriptive and generative tasks. Applications of CNNs span multiple domains, including robotics, image recognition, cybersecurity, and the medical field, where they are frequently employed as tools for automated diagnosis [89]. From the FPGA implementation perspective, CNNs can be efficiently deployed, offering the advantage of fast processing with high accuracy. However, the main challenges stem from the resource constraints of the hardware platforms, which can impact model performance due to the networks’ complex architectures and intensive computational operations.
In recent years, there has been a significant trend toward integrating CNNs on FPGA platforms for medical diagnostics, highlighting the need for new and innovative solutions aimed at optimizing and enhancing the efficiency of automated diagnostic processes. Consequently, a number of scientific studies in the field have designed various CNN architectures for ECG classification systems, proposing optimized solutions for development, implementation, and hardware acceleration, successfully overcoming, in some cases, the main existing limitations.
In the study [90], the authors proposed a quantized CNN (Q-CNN) implemented on an Artix-7 FPGA platform, aimed at real-time classification of atrial fibrillation. The system was developed to reduce resource consumption by using SIMD-based vector units for data parallelism, along with a sliding buffer. Additionally, the solution employed 22-bit quantization, achieving an accuracy of 94%, a latency of 1.358 ms, and a maximum operating frequency of 25.5 MHz.
In [91], the authors introduced a quantization-aware training (QAT) framework optimized for hardware implementations, aimed at real-time classification of ECG signals. The framework provides flexibility in assigning precision across layers and configuring the quantization scheme, enabling efficient use of FPGA resources. The hardware accelerator integrates a streaming architecture and a module for the approximate continuous wavelet transform (CWT), optimized for fast processing. Implemented on the Arty A7-35 FPGA platform, the system was evaluated using the MIT-BIH database, achieving 99.5% accuracy, an inference time of 0.35 ms, and a dynamic power consumption of 200 mW, demonstrating a high-performance balance between precision and energy efficiency.
Lai Wei et al. [92] develop an ECG classification technique based on a 1D-CNN, hardware-optimized for inference on the Zynq-ZC706 platform. The authors’ solution aims to utilize a convolutional processing element (PE) module to reduce resource usage and improve computational efficiency. Evaluation of the method demonstrated effective performance, achieving 98.94% accuracy, using 2.51 k LUTs, and a power consumption of 0.79 W.
The research study presented in [93] highlights hardware optimization techniques for a CNN aimed at classifying cardiac arrhythmias. The architecture was developed using data-flow methods and layer-wise partitioning, with a focus on efficient memory management and component mapping. Implemented on the Zybo Z7-20 FPGA, the 7-bit quantized convolutional network achieved an accuracy of 91.9%, a power consumption of 0.15 Wh, and utilized less than 25% of the available resources. The optimized accelerator reached a speed over 300 times faster compared to software execution on the ARM Cortex-A9 processor.
In a subsequent research stage [94], the authors designed an unstructured sparse CNN accelerator aimed at portable devices for real-time analysis and classification of ECG signals. The architecture is based on a tile-first dataflow and compressed data storage, which enhances training efficiency by eliminating operations with zero weights. To optimize the accelerator’s flexibility, a highly adaptable processing element array was employed. The proposed design was implemented on a Zynq ZC706 FPGA, achieving an average accuracy of 98.99%, an inference time of 4.6 ms, and a power consumption of 0.5 W.
Another approach for real-time cardiac beat classification is presented in [95], an optimized CNN accelerator designed for portable ECG monitoring devices. The system aims to provide an efficient edge computing solution by applying clock gating techniques to minimize power consumption. The network was implemented on a PYNQ-Z2 FPGA, achieving a 23.7× reduction in LUT usage and a 2.99× reduction in total resource utilization. In terms of inference time, the design reached a low latency of only 9.06 ms.
Mangaraj et al. [96] proposed a portable FPGA-based healthcare assistance system using a two-branch CNN architecture to reduce data reuse complexity and exploit FPGA parallelism. The accelerator integrates three hardware optimization techniques: a parallel-quantized-pixel convolution module, a fused add-pooling unit, and a skip-zero-weight architecture. Data are represented in UINT4 format to avoid negative intermediate values. Implemented on the PYNQ-Z2 platform and evaluated using the MIT-BIH database, the system achieved an accuracy of 97.79% for five types of cardiac beats, with a latency of 236 ms and a throughput of 63 GOP/S.
Ahmed et al. [97] performed bioelectrical signal classification using a 1D-CNN and a hardware accelerator. The system was optimized to maximize performance on the FPGA Zynq-7000_xc7z045, targeting portable medical devices. The authors also explored mechanisms to improve data transfer and reduce resource usage by employing a compact register and replacing multiplication operations with shift operations. The architecture outperformed other solutions, achieving 99% accuracy and a 1.14× acceleration at a frequency of 442.948 MHz.
Wenhan et al. [19] present an efficient ECG signal classification system, fully hardware-mapped on an Intel Cyclone V FPGA. The proposed solution is dual, utilizing a 1D-CNN together with a heart rate estimator. In designing the accelerator, the authors applied advanced optimization techniques, achieving a maximum acceleration of 43.08× compared to the software-based method on an ARM Cortex-A53 processor and 8.38× compared to an Intel Core i7-8700 CPU. Regarding detection performance, the model achieved an accuracy of 93.24% with a power consumption of 67.74 mW.
The research study [98] investigates a novel 1D-CNN architecture, based on multiplicative behavior and data reuse, aimed at cardiac arrhythmia classification. The hardware implementation relies on an artificial intelligence accelerator (AIA) FPGA and was optimized through techniques such as quantization, pruning, and a parallel shift processing element array arrangement (PSPEAA). Implemented on a Xilinx PYNQ-Z2 FPGA, the architecture achieved an ECG classification accuracy of 96.6%, a power consumption of 0.131 W, and an acceleration of 29×.
The study [99] describes the CoNN algorithm, a parallelized CNN model for arrhythmia classification, implemented on FPGA using 16-bit fixed-point arithmetic. The hardware solution targets the Zedboard board, aiming for efficient real-time execution. Performance was validated on two benchmark databases, MIT-BIH and European ST-T, using a fivefold cross-validation scheme. The system achieved an accuracy of 93.80% on both datasets.
Wang et al. [100] proposed a hardware system for arrhythmia detection based on ECG signals, as well as for valvular disease detection using PCG signals. The method relies on an 8-bit fixed-point quantized CNN model, accelerated via a programmable AI architecture that integrates an application-specific instruction-set processor and systolic arrays. Implementation on the PYNQ-Z2 board resulted in a power consumption of 106 mW. Evaluations demonstrated an accuracy of 97.4% with an inference time of 6.8 ms on the MIT-BIH database, and 99.1% accuracy with 21 ms inference time on the HVD database.
Lee et al. [101] developed a hardware acceleration system for real-time cardiac arrhythmia diagnosis. The architecture was implemented using the Alveo U200 FPGA accelerator card, and its performance was evaluated by processing 1987 heartbeats using both the hardware accelerator and a reference software implementation. Experimental results indicate a significant reduction in execution time: 0.572 s for the hardware solution compared to 5.70 s for the software implementation, corresponding to an acceleration of approximately 89.96% in favor of the proposed accelerator.
The scientific study [102] presents a 2D-CNN architecture dedicated to heartbeat classification, implemented on the Xilinx ZCU104 FPGA Evaluation Kit. To reduce hardware cost, the number of trainable parameters was optimized by performing channel-wise addition of the half-split feature maps, while both weights and activations were quantized in INT8 format. Experimental evaluation on the MIT-BIH database demonstrated a software accuracy of 98.64%, with a power consumption of 4.177 W and a latency of 219 ms for arrhythmia detection.
In [103], a compact 1D-CNN was implemented, based on residual connections and depthwise separable convolutions, aimed at reducing computational complexity. The model was optimized using a combination of unstructured pruning and incremental quantization techniques, achieving a software accuracy of 99.59% on the MIT-BIH database. For ECG signal processing, the authors also developed a hardware accelerator implemented on the Xilinx Zynq 7Z020 platform, which achieved an accuracy of 96.55%, a latency of 63 ms, and a power consumption of 1.78 W, demonstrating high efficiency in arrhythmia detection.
In [104], Aruna et al. designed an FPGA-based hardware acceleration system for a 1D-DCNN, aimed at binary classification of ECG signals. The error normalized least mean square (ENLMS) algorithm was employed for ECG signal processing, while feature extraction was performed using the DWT technique. Implemented on a Virtex-4 FPGA, the design achieved an accuracy of 98.6%, a power consumption of only 0.45 mW, a latency of 5.39 ns, an operating frequency of 185.426 MHz, and a total runtime of 15 s.
In [105], the Tiny InceptionNet Accelerator is proposed, designed for 1D-CNN and used for the classification of cardiac arrhythmias. The architecture integrates a dual processing element assembly (D-PEA) for parallel execution of multiply-accumulate (MAC) operations, along with a shared pixel distributor (SPD) module responsible for dynamically coordinating data flows. The solution was implemented on the ZCU102 FPGA platform and evaluated using the MIT-BIH database. Experimental results show an accuracy of 99.49% and an inference time of only 17.8 µs per heartbeat.
The study presented in [106] proposes an optimized hardware architecture for wearable devices aimed at ECG signal classification. The solution is designed for implementing a 1D-CNN and employs global average pooling to reduce parameters, along with a parallel processing array that enhances operational efficiency. To decrease energy consumption and redundant computations, the authors introduce dynamic activation of the sign bit, reducing the power usage of the ReLU function. Implemented on the Xilinx Zynq ZC706 platform in 16-bit fixed-point format, the system achieves a throughput of 25.7 GOP/S. Evaluation using the MIT-BIH database indicates an average accuracy of 99.10% for classifying five types of heartbeats.
The study in [107] investigates the use of FPGAs to accelerate ECG signal classification based on CNNs. The hardware implementation was carried out using the Libero platform and the Microchip PolarFire SoC Icicle Kit, targeting the automatic detection of abnormal heartbeats. To enhance processing efficiency in the convolutional layers, the architecture leverages the inherent parallelism of the FPGA pipeline, optimizing data flow and reducing latency. Evaluation on the MIT-BIH database demonstrated an accuracy of 98.6%.
Yu-Ching Ting et al. described in [108] an ECG classification system using a 2D-CNN neural network, aimed at detecting fetal heartbeats from abdominal signals in real time. The neural architecture was implemented on a Xilinx Virtex-7 FPGA, utilizing a spectrogram processor and a bidirectional network. The authors proposed a pre-fetch mechanism that enables uninterrupted 2D convolution by mapping it onto a 1D core, reducing the effect of asymmetry between the spectrogram dimensions and the kernel size. The proposed approach achieved a maximum accuracy of 95.2%, halved the number of computation cycles, and consumed only 12.33 mW of power.
The hardware implementations analyzed in the previously presented studies and summarized in Table 3 highlight the trend toward integrating convolutional neural networks into embedded systems dedicated to real-time ECG signal processing [19,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108].
Based on the reviewed literature, three major categories of FPGA platforms can be identified: low-cost solutions (Artix-7, Virtex-4), SoC platforms (Zynq-7000, Cyclone V, PolarFire), and high-performance devices (Virtex-7, Zynq UltraScale+, Virtex UltraScale+).
Among these, the Zynq-7000 family dominates research due to its capability to integrate neural networks with complex architectures [79,81,82,92,93,94,95,96,97,98,99,100,103,106]. Low-cost FPGAs continue to provide efficient solutions for portable systems, achieving accuracy values of at least 98% [86,87,91,104,107]. In the high-performance segment, Zynq UltraScale+ platforms reach accuracy levels between 98.64% and 99.49% [102,105], confirming their utility in scenarios with high-speed and high-throughput requirements.
The accuracies reported in the analyzed studies are at least 91.9%, meeting the quality criteria required for clinical applications and demonstrating that FPGA implementations of DL algorithms represent a mature solution for automatic ECG signal classification. The traditional MLP algorithm [79] remains competitive for detecting atrial fibrillations, achieving accuracy values above 93% on both databases. LSTMs [86,87] prove particularly effective in sequential classification tasks and real-time arrhythmia detection, attaining accuracies of at least 98% in both cases. In this context, the Artix-7 platform has proven ideal for sequential processing tasks. CNNs represent the most diverse category, with reported accuracies ranging from 91.9% to 99.5%, depending on the architecture and FPGA platform. Implementations on PYNQ-Z2 [98,100] and Zynq-ZC706 [97] achieve accuracies between 96.6% and 99%, highlighting their efficiency for edge computing applications. However, non-optimized models or those implemented on FPGAs with limited hardware resources may exhibit lower precision. For instance, study [93] reports an accuracy of 91.9% for a 2D-CNN architecture implemented on the Zybo Z7-20 FPGA board.
Most systems utilize the MIT-BIH database, ensuring standardized validation; however, many solutions remain dependent on the characteristics of this dataset, limiting the analysis on model generalizability and dataset bias, which may limit their applicability in real clinical scenarios. Additionally, reported performance is influenced by the number of classes analyzed and the selection of experimental subsets. A key limitation is the significant gap between reported performance and deployment in real medical systems, as none of the studies provide extensive clinical testing.

6. FPGA Acceleration of Deep Learning Models

The increasing adoption of deep learning methods, such as CNNs and DNN architectures, has led to growing computational demands due to model complexity and the high number of multiply–accumulate (MAC) operations, often implemented in 32-bit floating-point arithmetic (FP32). To meet these intensive requirements, hardware accelerators have become essential components of modern AI infrastructures. In this context, FPGAs represent a high-performance solution due to their low latency, energy efficiency, and high reconfigurability-properties that are critical for Edge-AI applications. Moreover, their native support for parallel execution and fast interconnection between resources significantly enhances the performance of AI model inference [109].
The studies analyzed in this scientific work are summarized in Table 4, aiming to highlight the performance of deep learning accelerators implemented on FPGA platforms for ECG signal classification. A detailed overview of the optimization strategies applied to state-of-the-art accelerators, developed on various FPGA families, is thus presented. For performance comparison, different design and efficiency approaches were included, tailored to specific operating frequencies and evaluated based on numerical accuracy, total power consumption, and system latency.
The use of neural networks for real-time classification tasks on reconfigurable architectures involves customizing hardware accelerators with dedicated optimization and configuration strategies, tailored to specific application scenarios. In this context, based on a comparative analysis of 24 studies published between 2019 and 2025, which address various recent design techniques for deep neural network architectures aimed at FPGA inference, a series of conclusions can be drawn highlighting current trends and research directions in this field. The authors would like to advise readers that, in several cases, key ANN/CNN hyperparameters and design size—often expressed in terms of FPGA resource utilization (e.g., flip-flops, LUTs, BRAMs)—are not consistently reported in the literature. Therefore, this comparison should be regarded only as a baseline overview of the FPGA platforms used.
Also, it is worth clarifying that the precision related performances of a classification method such as accuracy, precision, sensitivity, F1 score,, etc., are solely determined by the mathematical model of the classification algorithm itself, as far as the implementation does not change the number precision. The hardware implementation, and, implicitly, the FPGA family and device chosen will be responsible for the physical related performance indicators, such as the classification time/speed—that also depends on maximum clock frequency, latency etc., power consumption, and not the least the cost of the hardware.
Firstly, it can be observed that over 65% of the analyzed studies utilize heterogeneous platforms, such as Zynq-7000 [79,82,92,93,94,95,96,97,98,99,100,103,106] and Zynq UltraScale+ [102,105], highlighting the predominant focus of hardware accelerator development on modern SoC families. As presented in Section 4.2, these two families stand out for their efficient integration of hardware acceleration capabilities with software processing flexibility, facilitating the implementation of scalable and high-performance embedded medical diagnostic systems. An alternative hardware implementation approach is represented by low-cost families such as Artix-7, Virtex-4, Virtex-7, Cyclone-V, or Virtex UltraScale+. Therefore, the general trend in recent research toward SoC families and high-performance FPGAs reflects the need to leverage advanced resources.
Hardware accelerators dedicated to ECG classification highlight a dominant research direction focused on the use of 1D-CNNs, with over 50% of the reviewed studies targeting the classification of temporal data from ECG signals for FPGA-based inference. The predominant approach aims to optimize these architectures on SoC platforms, due to the balance they offer between software flexibility and hardware efficiency. There are also notable contributions for 2D-CNN [93,95,101,102,108], LSTM [86,87], and Q-NN [90], although in a more limited number due to their higher computational demands. The 2D-CNN architectures are primarily implemented on Zynq-7000 and UltraScale+ platforms, as these provide high computational power and efficient acceleration of operations.
Regarding the methodology of hardware accelerator optimization strategies reported in the analyzed literature, the most common techniques include: pipelining [19,79,82,86,91,92,97,100,101,102,106,107], quantization [19,90,91,93,96,98,101,103,104,108], dataflow optimization [92,93,94], pruning [98,103], loop unrolling [79,90,92], and the use of processing element arrays [92,94]. These compression and optimization strategies clearly indicate a strong focus on designing efficient architectures tailored for real-time control and inference in the context of cardiac disease detection and classification.
Figure 5 presents a Sankey diagram that provides a visual representation of the main recent optimization strategies applied to FPGA accelerators (left side) for various deep learning models (right side) in the studies analyzed from the literature. It can be observed that 1D-CNN architectures occupy the leading position among implementations, being associated with the largest number of techniques, followed by 2D-CNN models. This distribution highlights the primary design focus of cardiac diagnostic accelerators around these two types of deep networks, reflecting their high potential for optimization on FPGA platforms.
Most accelerators for 1D-CNNs employ fixed-point numerical representations with precision ranging from 4 to 16 bits. This approach reflects the current trend of minimizing computational cost while achieving a favorable trade-off between performance and resource utilization. A recent study [96] adopts the UINT4 format combined with specific optimization techniques; however, energy consumption is not reported, and the latency is relatively high, exceeding 200 ms. Other works use INT8 representations [19,102] for CNN quantization, aiming to maintain high accuracy. In contrast, only two studies employ floating-point formats (FP16 or FP32), focusing primarily on numerical stability.
The Sankey diagram from Figure 6, based on the dataset presented in Table 4, illustrates the correlations between the FPGA family used, the type of DL accelerator, and the adopted numerical precision, highlighting the predominant trends identified in the current literature. The width of the flows indicates the frequency of occurrence of each combination of features across the analyzed studies.
The operating frequency ranges from low values (35 KHz–10 MHz) up to over 400 MHz, reaching as high as 666.7 MHz [79]. However, the most commonly reported frequencies lie between 50 MHz and 250 MHz. An important observation is that studies reporting accelerator operating frequencies below 10 MHz employed extensive and complex architectural optimization techniques to compensate, achieving low power consumption and minimal latency.
Regarding power optimization, numerous studies report power consumption below 1 W for quantized CNN accelerators, with some implementations achieving very low values, such as 0.01 W [108] or 0.10 W [100]. This highlights the benefits of architectures employing advanced compression techniques and deep pipelining. As for latency, reported values range from 5.39 × 10−6 ms [104] up to 572 ms [101], being strongly influenced by the degree of architectural optimization. A clear relationship can be observed between numerical precision and both energy performance and latency [79,87,98,100], lower precision and higher parallelism generally lead to reduced power consumption and lower latency. The best trade-off between precision, latency, and power consumption among Zynq-7000 implementations is reported in [98], where the authors combined 8-bit fixed-point quantization with the PSPEAA architecture and optimization techniques based on weight- and input-stationary strategies for a 1D-CNN accelerator. Beyond Zynq-7000 platforms, competitive performance was also reported for a 1D-CNN accelerator implemented on a Cyclone V FPGA [19], achieving a latency below 0.06 ms and a power consumption of only 0.06 W at a frequency of 50 MHz. Competitive results were similarly obtained for LSTM [86,87], Q-CNN [90], and MLP [79] accelerators, which employed different numerical representations for quantization. Overall, the incomplete reporting of FPGA performance parameters in some studies limits comprehensive comparison. Nevertheless, the available data demonstrate that optimization techniques applied to accelerator architectures significantly reduce inference costs for real-time ECG classification applications.

7. Conclusions

In the context of recent advancements in biomedical technologies, the ECG signal represents a fundamental element in the evaluation and prevention of cardiovascular diseases. The implementation of artificial intelligence algorithms on dedicated hardware platforms has contributed to significant progress in the monitoring and diagnosis of heart conditions. These technological advances are designed to support clinical decision-making by automatically and in real time assessing the health status of patients. Furthermore, they contribute to reducing hospitalization costs, lowering the burden on the healthcare system, and decreasing the incidence of heart disease. Due to their remarkable performance and flexibility in implementing neural networks, FPGAs are suitable as a robust platform for developing analysis and diagnostic systems. This is supported by recent and in-depth research on neural network acceleration, which has highlighted advantages such as low power consumption, optimal latency, and high detection accuracy.
This paper synthesizes the current technological progress in the automatic classification of ECG signals, focusing on FPGA architectures used to accelerate machine learning models.
The main development directions were analyzed, highlighting current efforts to reduce computational cost, increase energy efficiency while maintaining high operating frequency, and improve classification performance. Additionally, a comparative perspective on the most recent hardware implementations reported in the literature is provided. The analysis of the studies confirms the use of the Zynq-7000 platform as the dominant hardware solution for implementing ML and DL algorithms, a result attributed to its heterogeneous architecture, which enables both software and hardware co-processing on the same chip. In general, the Zynq-7000 series appears to be a strong choice for implementing ML/DL applications, as such workloads can exploit the high degree of parallelism offered by FPGAs. It is important to note, that FPGA architectures are primarily optimized for operations such as filtering, convolution, and vector/matrix computations, rather than being specifically tailored for ECG classification. Looking into the future trends, FPGA vendors optimize their devices for AI applications, meaning they offer better support for network connectivity (retrieving/storing the processed data faster from/to the operational memory), for vector/matrix operations, DSP blocks. Thus, the reader may see a grow in the utilization of Versal (AI Edge, or AI Core) devices. With respect to design flow, HLS remains an exotic solution, RTL descriptions are rulingthe literature.
The review of machine learning research highlights a balanced distribution between the use of SVM and ANN classifiers in ECG classification applications implemented on FPGA platforms. Nevertheless, the SVM classifier stands out as the preferred solution for hardware-efficient implementations due to its robustness and low computational cost.
Regarding the preferred architecture for FPGA acceleration, 1D-CNNs stand out as the primary choice. They offer a favorable balance between classification accuracy and hardware implementation complexity, making them particularly well-suited for the analysis and processing of ECG signals. Moreover, most 1D-CNN architectures have demonstrated high performance on both advanced ARM processor platforms and resource-constrained FPGA families. In contrast, LSTM architectures are preferred especially for low-cost platforms, such as Artix-7, provided that efficient optimization strategies are applied to reduce high resource requirements and support model acceleration. The most challenging classification systems, in terms of balancing the compromise between detection accuracy, processing speed, and energy consumption are represented by 2D-CNN architectures.
The analysis of the synthesized literature highlights a series of recurring directions for the hardware optimization of deep network accelerators. Pipelining remains the most widely used technique, ranging from parallel processing approaches to full pipelining of PE arrays, which confirms the importance of latency reduction in real-time diagnostic applications. Another major direction is represented by aggressive data quantization, where a growing preference for low-precision representations is observed. In addition, structural parallelism and dataflow optimization frequently appear as key solutions for achieving scalable and efficient architectures adapted to the requirements of embedded biomedical applications.
The review of the studies confirms the adoption of the MIT-BIH dataset as the primary reference for evaluation; however, limiting the assessment to this single dataset may restrict the applicability of the results in different contexts. The absence of validation in various clinical scenarios highlights a significant limitation of the current literature, reducing the potential for transferability of the developed solutions to practical applications in real environments.
In conclusion, the evolution of intelligent systems for medical diagnosis is closely linked to the critical requirements of real-time applications, such as low latency, efficient resource utilization, and hardware scalability. FPGA platforms have proven to be a promising solution in this context, due to their flexibility and ability to accelerate artificial intelligence algorithms directly at the hardware level. Specifically, for ECG signal analysis, the use of FPGAs enables the implementation of deep learning models that combine high accuracy with fast response times, an essential aspect for clinical applications. However, most of the reviewed studies focus primarily on a limited number of FPGA families. Platforms oriented towards low-power applications, as well as heterogeneous architectures that integrate multiple major processing tasks into a single platform, represent promising research directions that can be explored for real-time cardiovascular diagnostic systems and wearable medical devices. In addition, methods for accelerating deep neural networks offer significant development potential. Although one of the major challenges at present lies in the efficient selection and allocation of optimization techniques, future research will focus on multi-accelerator processing, efficient memory distribution, and optimal task allocation across multiple FPGAs. Based on trends observed in the recent literature, the interest in developing biomedical AI solutions on FPGAs is expected to grow, both for portable systems and for more complex clinical infrastructures, highlighting significant potential for optimization and integration of hardware accelerators into diagnostic workflows.

Author Contributions

Conceptualization, L.-I.M., C.-G.B., P.F. and S.H.; methodology, L.-I.M., C.-G.B. and S.H.; resources, L.-I.M., C.-G.B., B.S.K. and A.F.; writing—original draft preparation, L.-I.M. and C.-G.B.; writing—review and editing, L.-I.M., C.-G.B., P.F., S.H., B.S.K. and A.F.; visualization, L.-I.M., C.-G.B. and B.S.K.; supervision, P.F. and S.H.; project administration, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project “Romanian Hub for Artificial Intelligence-HRIA”, Smart Growth, Digitization and Financial Instruments Program, MySMIS no. 351416.

Informed Consent Statement

During the preparation of this work the authors used Microsoft 365 Copilot Chat and ChatGPT 5.1 and 5.2 to improve language and readability. After using these tools/services, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Data Availability Statement

The original contributions presented in this study are included in the article material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
1DOne dimensional
ACAPAdaptive compute acceleration platform
ADTFAdaptive dual threshold filter
AIArtificial intelligence
AIAArtificial intelligence accelerator
AIEAI engine
AMBAAdvanced microcontroller bus architecture
AMDAdvanced Micro Devices
ANNArtificial neural network
ARMAdvanced RISC machine
ASICApplication-specific integrated circuit
ASIPApplication-specific instruction set processor
BLWDFBi-reciprocal lattice wave digital filter
BRAMBlock random access memory
Bi-RCSBidirectional recurrent chimp search
CLBConfigurable logic block
CNNConvolutional neural network
CPSC2018China physiological signal challenge 2018
CPUCentral processing unit
CVDCardiovascular disease
DADistributed arithmetic
DLDeep learning
DNNDeep neural network
D-PEADual processing element assembly
DSPDigital signal processor
DSCDistributed arithmetic stochastic computing
DWTDiscrete wavelet transform
ECGElectrocardiogram
EEGElectroencephalogram
EMGElectromyogram
ENLMSError-normalized least mean square
FIRFinite impulse response
FPFloating point
FPGAField-programmable gate array
GPUGraphics processing units
HBMHigh bandwidth memory
HLSHigh-level synthesis
InFOIntegrated fan-out
IPMVMInner product matrix–vector multiplication
LMSLeast mean square
LSTMLong short-term memory
LUTLook-up table
MACMultiply–accumulate
MIT-BIHMassachusetts Institute of Technology–Beth Israel Hospital
MLMachine learning
MLPMulti-layer perceptron
MMUMatrix mapping unit
MPSoCMultiprocessor system-on-chip
NoCNetwork-on-chip
PCAPrincipal component analysis
PCGPhonocardiogram
PEProcessing element
PLProgrammable logic
PLAProgrammable logic array
PNNProbabilistic neural network
P-SCADAPipelined stochastic adaptive distributed architecture
PSProcessing system
PSPEAAParallel shift processing element array arrangement
PSRPipeline state register
QATQuantization-aware training
Q-CNNQuantized convolutional neural network
ReLURectified linear unit
RNNRecurrent neural network
SE-NLMSSequential error-normalized least mean square
SNRSignal-to-noise ratio
SIMDSingle instruction multiple data
SiPSystem-in-package
SoCSystem-on-chip
SPDShared pixel distributor
SVMSupport vector machine
UINT4Unsigned 4-bit integer
VLSIVery-large-scale integration
ZCUZynq UltraScale

References

  1. Mohan, N.; Hosni, A.; Atef, M. Neural networks implementations on FPGA for biomedical applications: A review. SN Comput. Sci. 2024, 5, 1004. [Google Scholar] [CrossRef]
  2. Talib, M.A.; Majzoub, S.; Nasir, Q.; Jamal, D. A systematic literature review on hardware implementation of artificial intelligence algorithms. J. Supercomput. 2021, 77, 1897–1938. [Google Scholar] [CrossRef]
  3. Wei, Y.; Zhou, J.; Wang, Y.; Liu, Y.; Liu, Q.; Luo, J.; Wang, C.; Ren, F.; Huang, L. A review of algorithm & hardware design for AI-based biomedical applications. IEEE Trans. Biomed. Circuits Syst. 2020, 14, 145–163. [Google Scholar] [CrossRef]
  4. Serhani, M.A.; El Kassabi, H.T.; Ismail, H.; Nujum Navaz, A. ECG monitoring systems: Review, architecture, processes, and key challenges. Sensors 2020, 20, 1796. [Google Scholar] [CrossRef]
  5. Marani, R.; Perri, A.G. An intelligent system for continuous blood pressure monitoring on remote multi-patients in real time. arXiv 2012, arXiv:1212.0651. [Google Scholar] [CrossRef]
  6. Singhal, S.; Kumar, M. A systematic review on artificial intelligence-based techniques for diagnosis of cardiovascular arrhythmia diseases: Challenges and opportunities. Arch. Comput. Methods Eng. 2023, 30, 865–888. [Google Scholar] [CrossRef]
  7. Johnson, A.; Bulgarelli, L.; Pollard, T.; Gow, B.; Moody, B.; Horng, S.; Celi, L.A.; Mark, R. MIMIC-IV, version 3.1; PhysioNet, 2024. [CrossRef]
  8. Johnson, A.; Lungren, M.; Peng, Y.; Lu, Z.; Mark, R.; Berkowitz, S.; Horng, S. MIMIC-CXR-JPG—Chest Radiographs with Structured Labels, version 2.1.0; PhysioNet, 2024. [CrossRef]
  9. Goldberger, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.C.; Mark, R.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
  10. Altman, M.B.; Wan, W.; Hosseini, A.S.; Nowdeh, S.A.; Alizadeh, M. Machine learning algorithms for FPGA Implementation in biomedical engineering applications: A review. Heliyon 2024, 10, e26652. [Google Scholar] [CrossRef] [PubMed]
  11. Farooq, U.; Marrakchi, Z.; Mehrez, H. FPGA architectures: An overview. In Tree-Based Heterogeneous FPGA Architectures: Application Specific Exploration and Optimization; Springer: New York, NY, USA, 2012; pp. 7–48. [Google Scholar] [CrossRef]
  12. Xilinx. Vivado Design Suite User Guide: High-Level Synthesis, UG902, v2020.1, Xilinx, 2021. Available online: https://www.xilinx.com/support/documents/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf (accessed on 18 December 2025).
  13. Roth, C.H., Jr.; John, L.K. Digital Systems Design Using VHDL, 2nd ed.; Cengage Learning: Boston, MA, USA, 2008. [Google Scholar]
  14. Kaeslin, H. Digital Integrated Circuit Design from VLSI Architectures to CMOS Fabrication; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
  15. John, L.K.; Roth, C.H., Jr.; Lee, B.K. Digital Systems Design Using Verilog, 1st ed.; Cengage Learning: Boston, MA, USA, 2015. [Google Scholar]
  16. Khan, M.I.; Da Silva, B. Harnessing FPGA technology for energy-efficient wearable medical devices. Electronics 2024, 13, 4094. [Google Scholar] [CrossRef]
  17. Wang, C.; Luo, Z. A review of the optimal design of neural networks based on FPGA. Appl. Sci. 2022, 12, 10771. [Google Scholar] [CrossRef]
  18. Chong, B.; Jayabaskaran, J.; Jauhari, S.M.; Chan, S.P.; Goh, R.; Kueh, M.T.W.; Li, H.; Chin, Y.H.; Kong, G.; Anand, V.V.; et al. Global burden of cardiovascular diseases: Projections from 2025 to 2050. Eur. J. Prev. Cardiol. 2025, 32, 1001–1015. [Google Scholar] [CrossRef]
  19. Liu, W.; Guo, Q.; Chen, S.; Chang, S.; Wang, H.; He, J.; Huang, Q. A fully-mapped and energy-efficient FPGA accelerator for dual-function AI-based analysis of ECG. Front. Physiol. 2023, 14, 1079503. [Google Scholar] [CrossRef]
  20. Wang, N.; Zhou, J.; Dai, G.; Huang, J.; Xie, Y. Energy-efficient intelligent ECG monitoring for wearable devices. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 1112–1121. [Google Scholar] [CrossRef] [PubMed]
  21. Demirel, B.U.; Bayoumy, I.A.; Al Faruque, M.A. Energy-efficient real-time heart monitoring on edge–fog–cloud internet of medical things. IEEE Internet Things J. 2021, 9, 12472–12481. [Google Scholar] [CrossRef]
  22. Pineda-López, F.; Martínez-Fernández, A.; Rojo-Álvarez, J.L.; García-Alberola, A.; Blanco-Velasco, M. A flexible 12-Lead/Holter device with compression capabilities for low-bandwidth mobile-ECG telemedicine applications. Sensors 2018, 18, 3773. [Google Scholar] [CrossRef] [PubMed]
  23. Dore, H.; Aviles-Espinosa, R.; Rendon-Morales, E. FPGA implementation of ECG signal processing for use in a neonatal heart rate monitoring system. Eng. Proc. 2022, 27, 70. [Google Scholar] [CrossRef]
  24. Zhu, M.; Kuang, Q.; Yang, C.; Lin, J. Optimization of convolutional neural network hardware structure based on FPGA. In Proceedings of the 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), Wuhan, China, 31 May–2 June 2018; pp. 1797–1802. [Google Scholar] [CrossRef]
  25. Uttraphan, C.; Ardani, M.I.A.; Heng, C.W.; Ahmad, N.; Ching, K.B.; Raj, A.A.E. Hardware Implementation of FIR Filter for ECG Signal Processing: Design Optimization and Performance Analysis on an FPGA. J. Adv. Res. Appl. Sci. Eng. Technol. 2024, 40, 50–61. [Google Scholar] [CrossRef]
  26. Keerthiga, G.; Kumar, S.P. Evaluating FPGA-based denoising techniques for improved signal quality in electrocardiograms. Analog Integr. Circuits Signal Process. 2024, 120, 93–107. [Google Scholar] [CrossRef]
  27. Thannoon, H.H.; Hashim, I.A. FPGA Implementation of Efficient Adaptive Filter Incorporating Systolic Architecture. Eng. Technol. J. 2024, 42, 261–275. [Google Scholar] [CrossRef]
  28. Kalaiselvi, A.; Jayadharshine, S.; Geeth, R.K.; Tamizhmani, T. FPGA Based Power Efficient ECG Heartbeat Detection System For Wearable Devices With Enhanced Noise Filtering. In Proceedings of the 2025 3rd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 4–5 April 2025; pp. 1–7. [Google Scholar] [CrossRef]
  29. ElFerdaoussi, H.; Jenkal, W.; Laaboubi, M.; Latif, R. Memory Optimization Strategies for Real-Time ECG Denoising on FPGA: A DWT–ADTF Hybrid Architecture. In Proceedings of the 2025 International Conference on Circuit, Systems and Communication (ICCSC), Fez, Morocco, 19–20 June 2025; pp. 1–5. [Google Scholar] [CrossRef]
  30. Madiraju, N.S.; Kurella, N.; Valapudasu, R. FPGA Implementation of ECG feature extraction using Time domain analysis. arXiv 2018, arXiv:1802.03310. [Google Scholar] [CrossRef]
  31. Dhyani, S.; Kumar, A.; Choudhury, S.; Verma, C.; Illés, Z. Assessing ECG-QRS signal detection algorithm chip and simulation on several FPGAs. Discov. Comput. 2025, 28, 13. [Google Scholar] [CrossRef]
  32. Trabes, E.; Zayed, A.; Valderrama, C.; Tarrillo, J. Design Exploration of DWT-Based Feature Extraction Using FPGA for High-Performance Signal Processing. In Proceedings of the 2025 IEEE 16th Latin America Symposium on Circuits and Systems (LASCAS), Bento Gonçalves, Brazil, 25–28 February 2025; pp. 1–5. [Google Scholar] [CrossRef]
  33. Gon, A.; Mukherjee, A. FPGA-based low-cost architecture for R-peak detection and heart-rate calculation using lifting-based discrete wavelet transform. Circuits Syst. Signal Process. 2023, 42, 580–600. [Google Scholar] [CrossRef]
  34. Nandini, K.P.; Seshikala, G. Efficient ECG Arrhythmia Detection on FPGA using Machine Learning and Fiducial Windowing. Eng. Technol. Appl. Sci. Res. 2025, 15, 21100–21105. [Google Scholar] [CrossRef]
  35. Desai, M.P.; Caffarena, G.; Jevtic, R.; Márquez, D.G.; Otero, A. A Low-Latency, Low-Power FPGA Implementation of ECG Signal Characterization Using Hermite Polynomials. Electronics 2021, 10, 2324. [Google Scholar] [CrossRef]
  36. Purkayastha, B.B.; Barma, S. Hybrid CPU-FPGA Accelerated Architecture for Hurst Surface Feature Extraction and SVM Classification of ECG Signals. In Proceedings of the 2024 IEEE 31st International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW), Bangalore, India, 18–21 December 2024; pp. 33–39. [Google Scholar] [CrossRef]
  37. Villasenor, J.; Mangione-Smith, W.H. Configurable computing. Sci. Am. 1997, 276, 66–71. [Google Scholar] [CrossRef]
  38. Shawahna, A.; Sait, S.M.; El-Maleh, A. FPGA-based accelerators of deep learning networks for learning and classification: A review. IEEE Access 2018, 7, 7823–7859. [Google Scholar] [CrossRef]
  39. AMD. Zynq-7000 All Programmable SoC Technical Reference Manual. Available online: https://docs.amd.com/r/en-US/ug585-zynq-7000-SoC-TRM (accessed on 27 November 2025).
  40. Design & Reuse. Xilinx Introduces Zynq-7000 Family, Industry’s First Extensible Processing Platform. Available online: https://www.design-reuse.com/news/202519851-xilinx-introduces-zynq-7000-family-industry-s-first-extensible-processing-platform-/ (accessed on 27 November 2025).
  41. AMD. Zynq™ 7000 SoCs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/soc/zynq-7000.html (accessed on 27 November 2025).
  42. AMD. Zynq UltraScale+ MPSoC Data. Sheet: Overview, DS891 (v1.11.1) March 18, 2025. Available online: https://docs.amd.com/api/khub/documents/sbPbXcMUiRSJ2O5STvuGNQ/content (accessed on 18 December 2025).
  43. Hansen, L. Unleash the Unparalleled Power and Flexibility of Zynq UltraScale+ MPSoCs. Cell 2015, 2, 3–4. [Google Scholar]
  44. AMD. Zynq™ UltraScale+™ MPSoCs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/soc/zynq-ultrascale-plus-mpsoc.html (accessed on 27 November 2025).
  45. AMD. Versal: The First Adaptive Compute Acceleration Platform (ACAP). Available online: https://docs.amd.com/v/u/en-US/wp505-versal-acap (accessed on 27 November 2025).
  46. Zhang, W.; Liu, Y.; Bao, Z. Cat: Customized transformer accelerator framework on versal acap. arXiv 2024, arXiv:2409.09689. [Google Scholar] [CrossRef]
  47. AMD. Versal™ AI Core Series. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal/ai-core-series.html (accessed on 27 November 2025).
  48. AMD. Versal™ AI Edge Series. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal/ai-edge-series.html (accessed on 27 November 2025).
  49. Ahmad, A.; Al Busaidi, S.S.; Al-Maashri, A.; Awadalla, M.; Hussain, S. FPGAs–Chronological developments and challenges. Int. J. Electr. Eng. Technol. 2021, 12, 60–72. [Google Scholar]
  50. AMD Artix™ UltraScale+™ FPGAs, Cost- and Power-Optimized FPGAs with Exceptional Performance/Watt. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/artix-ultrascale-plus.html (accessed on 27 November 2025).
  51. AMD. Virtex™ UltraScale+™ FPGAs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/virtex-ultrascale-plus.html (accessed on 27 November 2025).
  52. AMD. Kintex™ UltraScale+™ FPGAs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/kintex-ultrascale-plus.html (accessed on 27 November 2025).
  53. AMD. Artix™ 7 FPGAs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/artix-7.html (accessed on 27 November 2025).
  54. AMD. Spartan™ 7 FPGAs. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/spartan-7.html (accessed on 27 November 2025).
  55. Altera. Cyclone® V FPGA and SoC FPGA. Available online: https://www.altera.com/products/fpga/cyclone/v (accessed on 27 November 2025).
  56. Altera. Cyclone® 10 FPGA. Available online: https://www.altera.com/products/fpga/cyclone/10 (accessed on 27 November 2025).
  57. Altera. Arria® 10 FPGAs and SoC FPGAs. Available online: https://www.altera.com/products/fpga/arria/10 (accessed on 27 November 2025).
  58. Altera. Stratix® 10 FPGA and SoC FPGA. Available online: https://www.altera.com/products/fpga/stratix/10 (accessed on 27 November 2025).
  59. Altera. Agilex™ 9 SoC FPGA Direct RF-Series. Available online: https://www.altera.com/products/fpga/agilex/9/rf-series (accessed on 27 November 2025).
  60. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
  61. Nayak, J.; Naik, B.; Behera, H.S. A comprehensive survey on support vector machine in data mining tasks: Applications & challenges. Int. J. Database Theory Appl. 2015, 8, 169–186. [Google Scholar] [CrossRef]
  62. Tsoutsouras, V.; Koliogeorgi, K.; Xydis, S.; Soudris, D. An exploration framework for efficient high-level synthesis of support vector machines: Case study on ecg arrhythmia detection for xilinx zynq soc. J. Signal Process. Syst. 2017, 88, 127–147. [Google Scholar] [CrossRef]
  63. Koliogeorgi, K.; Zervakis, G.; Anagnostos, D.; Zompakis, N.; Siozios, K. Optimizing svm classifier through approximate and high level synthesis techniques. In Proceedings of the 2019 8th International Conference on Modern Circuits and Systems Technologies (MOCAST), Thessaloniki, Greece, 13–15 May 2019; pp. 1–4. [Google Scholar] [CrossRef]
  64. Tang, X.; Tang, W. An ECG delineation and arrhythmia classification system using slope variation measurement by ternary second-order delta modulators for wearable ECG sensors. IEEE Trans. Biomed. Circuits Syst. 2021, 15, 1053–1065. [Google Scholar] [CrossRef] [PubMed]
  65. Swetha, R.; Ramakrishnan, S. Unfolding VLSI architecture for mixed noise removal and multiple classification of ECG signals. Circuits Syst. Signal Process. 2024, 43, 1993–2015. [Google Scholar] [CrossRef]
  66. Liu, Y.; Dong, L.; Zhang, B.; Xin, Y.; Geng, L. Real time ECG classification system based on DWT and SVM. In Proceedings of the 2020 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), Nanjing, China, 23–25 November 2020; pp. 155–156. [Google Scholar] [CrossRef]
  67. Murthy, D.; Sudhanya, P. Optimized ECG Processing System using FPGA with Polyphase Filter for Arrhythmia Classification. In Proceedings of the 2025 Third International Conference on Microwave, Antenna and Communication (MAC), Bhopal, India, 27–29 June 2025; pp. 1–6. [Google Scholar] [CrossRef]
  68. Jain, A.K.; Mao, J.; Mohiuddin, K.M. Artificial neural networks: A tutorial. Computer 1996, 29, 31–44. [Google Scholar] [CrossRef]
  69. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
  70. Dal, B.; Aşkar, M. Fixed-point fpga implementation of ecg classification using artificial neural network. In Proceedings of the 2022 Medical Technologies Congress (TIPTEKNO), Antalya, Turkey, 31 October–2 November 2022; pp. 1–4. [Google Scholar] [CrossRef]
  71. Vinaykumar, S.; Thilagavathy, R. Fpga implementation of artificial neural network (ann) for ecg signal classification. In Proceedings of the 2022 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Toronto, ON, Canada, 1–4 June 2022; pp. 1–6. [Google Scholar] [CrossRef]
  72. Srivastava, R.; Kumar, B.; Alenezi, F.; Alhudhaif, A.; Althubiti, S.A.; Polat, K. Automatic arrhythmia detection based on the probabilistic neural network with FPGA implementation. Math. Probl. Eng. 2022, 2022, 7564036. [Google Scholar] [CrossRef]
  73. Zairi, H.; Kedir Talha, M.; Meddah, K.; Ould Slimane, S. FPGA-based system for artificial neural network arrhythmia classification. Neural Comput. Appl. 2020, 32, 4105–4120. [Google Scholar] [CrossRef]
  74. Mari, K.J.; Mariammal, K.; Bobby, K.C.; Mohamed, A.A.S. Harnessing FPGA Technology to Classify ECG Signal with Neural Networks. In Proceedings of the 2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 9–10 May 2024; pp. 1–4. [Google Scholar] [CrossRef]
  75. Saadi, O.N.; Abdulkader, Z.N.; Abdul-Jabbar, J.M. Implementation of ECG Classification Xilinx System Generator. In Proceedings of the2019 2nd International Conference on Electrical, Communication, Computer, Power and Control Engineering (ICECCPCE), Mosul, Iraq, 13–14 February 2019; pp. 1–6. [Google Scholar] [CrossRef]
  76. Sharada, P.; Mahesh, K.T. Neural Network based ECG Anomaly Detection on FPGA. Asian J. Converg. Technol. (AJCT) 2019, 5. Available online: https://www.asianssr.org/index.php/ajct/article/view/883 (accessed on 18 December 2025).
  77. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
  78. Sze, V.; Chen, Y.H.; Yang, T.J.; Emer, J.S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 2017, 105, 2295–2329. [Google Scholar] [CrossRef]
  79. Chen, C.; da Silva, B.; Yang, C.; Ma, C.; Li, J.; Liu, C. AutoMLP: A framework for the acceleration of multi-layer perceptron models on FPGAs for real-time atrial fibrillation disease detection. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 1371–1386. [Google Scholar] [CrossRef]
  80. Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 1999, 5, 2. [Google Scholar]
  81. Shanthi, K.G.; Kinol, A.; Joy, M.; Santhi, S.; Kannan, K. Real-time FPGA Integration for ECG Monitoring: Bidirectional Recurrent Chimp Search Model. IETE J. Res. 2024, 70, 6848–6863. [Google Scholar] [CrossRef]
  82. Sathiya, A.; Prabhavathy, B.; Balasupramani, A.; Kavitha, L.; Saravanakumar, R. Low-Power Approximate Computing-based VLSI Architecture for Biomedical Signal Processing. In Proceedings of the 2025 International Conference on Visual Analytics and Data Visualization (ICVADV), Tirunelveli, India, 4–6 March 2025; pp. 541–548. [Google Scholar] [CrossRef]
  83. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  84. Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  85. Molina, R.S.; Ninkovic, V.; Vukobratovic, D.; Crespo, M.L.; Zennaro, M. Efficient Split Learning LSTM Models for FPGA-based Edge IoT Devices. arXiv 2025, arXiv:2502.08692. [Google Scholar] [CrossRef]
  86. Varadharajan, S.K.; Nallasamy, V. P-SCADA-a novel area and energy efficient FPGA architectures for LSTM prediction of heart arrthymias in BIoT applications. Expert Syst. 2022, 39, e12687. [Google Scholar] [CrossRef]
  87. Akhtar, N.; Fan, J.; Buzdar, A.R.; Ahmed, M.; Raza, A. VLSI Design of LSTM-Based ECG Classification for Continuous Cardiac Monitoring on Wearable Devices. Electron. Lett. 2025, 61, e70269. [Google Scholar] [CrossRef]
  88. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
  89. Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
  90. Jaramillo-Rueda, A.F.; Vargas-Pacheco, L.Y.; Fajardo, C.A. A computational architecture for inference of a quantized-CNN for detecting atrial fibrillation. Ing. Cienc. 2020, 16, 135–149. [Google Scholar] [CrossRef]
  91. Cao, T.; Ng, W.S.; Goh, W.L.; Gao, Y.; Huang, H.W. FPGA-Based Real-Time ECG Classification System Using Quantized Inception-ResNeXt Neural Network and CWT Approximation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2025, 34, 167–178. [Google Scholar]
  92. Wei, L.; Liu, D.; Lu, J.; Zhu, L.; Cheng, X. A low-cost hardware architecture of convolutional neural network for ECG classification. In Proceedings of the 2021 9th International Symposium on Next Generation Electronics (ISNE), Changsha, China, 9–11 July 2021; pp. 1–4. [Google Scholar] [CrossRef]
  93. Greco, L.; Moscato, F.; Ritrovato, P.; Vento, M. Fast and low cost FPGA-based architecture for arrhythmia detection with CNN. Internet Things 2025, 33, 101705. [Google Scholar] [CrossRef]
  94. Lu, J.; Liu, D.; Cheng, X.; Wei, L.; Hu, A.; Zou, X. An efficient unstructured sparse convolutional neural network accelerator for wearable ECG classification device. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 4572–4582. [Google Scholar] [CrossRef]
  95. Akshayraj, M.R.; PC, M.R.; Gopi, V.P.; Lakshminarayanan, G.; Gangadharan, G.R.; Kidav, J.U. Energy-efficient hardware design for CNN-based ECG signal classification in wearable bio-medical devices. In Proceedings of the 2024 28th International Symposium on VLSI Design and Test (VDAT), Vellore, India, 1–3 September 2024; pp. 1–7. [Google Scholar] [CrossRef]
  96. Mangaraj, S.; Mahapatra, K.; Ari, S. HLS-compiled PYNQ-based cardiac arrhythmia detection system leveraging quantized ECG beat images. Biomed. Signal Process. Control 2025, 109, 108063. [Google Scholar] [CrossRef]
  97. Jameil, A.K.; Al-Raweshidy, H. Efficient cnn architecture on fpga using high level module for healthcare devices. IEEE Access 2022, 10, 60486–60495. [Google Scholar] [CrossRef]
  98. Ku, M.Y.; Zhong, T.S.; Hsieh, Y.T.; Lee, S.Y.; Chen, J.Y. A high performance accelerating CNN inference on FPGA with arrhythmia classification. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–13 June 2023; pp. 1–4. [Google Scholar] [CrossRef]
  99. Alfaro-Ponce, M.; Chairez, I.; Etienne-Cummings, R. Automatic detection of electrocardiographic arrhythmias by parallel continuous neural networks implemented in FPGA. Neural Comput. Appl. 2019, 31, 363–375. [Google Scholar] [CrossRef]
  100. Wang, K.C.; Ku, M.Y.; Lee, S.Y.; Chen, J.Y. A Programmable Systolic-Array AI Accelerator System with High-Performance Model Quantization and Heart Disease Classification Algorithm Design. In Proceedings of the 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, UK, 25–28 May 2025; pp. 1–5. [Google Scholar] [CrossRef]
  101. Lee, D.; Lee, S.; Oh, S.; Park, D. Energy-efficient FPGA accelerator with fidelity-controllable sliding-region signal processing unit for abnormal ECG diagnosis on IoT edge devices. IEEE Access 2021, 9, 122789–122800. [Google Scholar] [CrossRef]
  102. Mangaraj, S.; Oraon, P.; Ari, S.; Swain, A.K.; Mahapatra, K. FPGA accelerated convolutional neural network for detection of cardiac arrhythmia. In Proceedings of the 4th IEEE International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI SATA), Bengaluru, India, 17–18 May 2024. [Google Scholar]
  103. Liu, Z.; Ling, X.; Zhu, Y.; Wang, N. FPGA-based 1D-CNN accelerator for real-time arrhythmia classification. J. Real-Time Image Process. 2025, 22, 66. [Google Scholar] [CrossRef]
  104. Aruna, V.B.K.L.; Chitra, E.; Padmaja, M. Accelerating deep convolutional neural network on FPGA for ECG signal classification. Microprocess. Microsyst. 2023, 103, 104939. [Google Scholar] [CrossRef]
  105. Pham, H.L.; Tran, T.D.; Le, V.T.D.; Vu, T.H.; Nakashima, Y. A Fast and Memory-Efficient CNN Accelerator for ECG Classification in Remote Healthcare Systems. In Proceedings of the 2025 22nd International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Bangkok, Thailand, 20–23 May 2025; pp. 1–6. [Google Scholar] [CrossRef]
  106. Lu, J.; Liu, D.; Liu, Z.; Cheng, X.; Wei, L.; Zhang, C.; Zou, X.; Liu, B. Efficient hardware architecture of convolutional neural network for ECG classification in wearable healthcare device. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 2976–2985. [Google Scholar] [CrossRef]
  107. Podugu, J.S.; Kondragunta, V.; Bhavirisetty, P.P.; Aruna, V.B.K.L. FPGA Enabled Deep Learning Accelerator for Multiclass Electrocardiogram Classification. In Proceedings of the 2023 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PRIMEAsia), Hyderabad, India, 19–22 November 2023; pp. 46–47. [Google Scholar] [CrossRef]
  108. Ting, Y.C.; Lo, F.W.; Tsai, P.Y. Implementation for fetal ECG detection from multi-channel abdominal recordings with 2D convolutional neural network. J. Signal Process. Syst. 2021, 93, 1101–1113. [Google Scholar] [CrossRef]
  109. Kalapothas, S.; Flamis, G.; Kitsos, P. Efficient edge-AI application deployment for FPGAs. Information 2022, 13, 279. [Google Scholar] [CrossRef]
Figure 1. The structural layout of this study.
Figure 1. The structural layout of this study.
Electronics 15 00301 g001
Figure 2. The flowchart illustrates the identification stages of the articles included in this review.
Figure 2. The flowchart illustrates the identification stages of the articles included in this review.
Electronics 15 00301 g002
Figure 3. Overview of SoC FPGA architecture.
Figure 3. Overview of SoC FPGA architecture.
Electronics 15 00301 g003
Figure 4. Classification of the articles using the Sankey diagram.
Figure 4. Classification of the articles using the Sankey diagram.
Electronics 15 00301 g004
Figure 5. Overview of optimization strategies applied to FPGA accelerators for DL models.
Figure 5. Overview of optimization strategies applied to FPGA accelerators for DL models.
Electronics 15 00301 g005
Figure 6. Overview of the correlation between FPGA families, accelerator types, and numeric precision.
Figure 6. Overview of the correlation between FPGA families, accelerator types, and numeric precision.
Electronics 15 00301 g006
Table 1. Characterization of FPGA families by fabrication technology, ARM processor integration, and AI support for neural network inference systems.
Table 1. Characterization of FPGA families by fabrication technology, ARM processor integration, and AI support for neural network inference systems.
ProducerFamilyInitial Release YearDevice TypeFabrication
Technology
ARM
CPU
Integrated
AI
Engine
Features
Xilinx (AMD)Zynq-7000 [40]2011SoC28 nmDual-Core ARM Cortex-A9 MPCore-It provides reduced latency in deep neural network implementations, designed for high-level embedded applications, hardware acceleration, and real-time control.
Artix-7
[53]
2012FPGA28 nm--Cost-optimized, it is dedicated to power-sensitive medical applications.
Virtex
UltraScale+
[51]
2016FPGA16 nm--Its 3D architecture makes it ideal for compute-intensive applications in the field of ML and for accelerated workloads.
Kintex
UltraScale+
[52]
2016FPGA16 nm--An intermediate solution offering an optimal cost-performance-power balance, ideal for DSP-intensive applications and neural network acceleration.
Zynq
UltraScale+
MPSoC
[44]
2016SoC16 nmARM Cortex-R5,
ARM Cortex-A53 Dual/Quad-Core
-It includes a dedicated AI/ML unit optimized for CNNs and an advanced memory hierarchy, enabling scalable and efficient embedded systems.
Spartan-7
[54]
2017FPGA28 nm--It provides an optimal cost-perfomance-power balance, targeting applications with moderate performance requirements.
Versal
AI Core
[47]
2019ACAP7 nmDual-Core ARM Cortex-A72It delivers exceptional computing power with strong CNN inference and acceleration performance, targeting complex real-time control applications such as biomedical image processing for faster diagnostics.
Artix
UltraScale+
[50]
2021FPGA16 nm--Based on InFO technology, it delivers unrivaled compute density for Edge applications and cost-sensitive neural network deployment.
Versal
AI Edge
[48]
2021ACAP7 nmDual-Core ARM Cortex-A72It offers breakthrough AI performance per watt for real-time systems and enables rapid development of sensor-fusion methods with AI algorithms, supporting diverse performance and power profiles from edge to endpoint.
Altera
(Intel)
Cyclone-V
[55]
2012FPGA28 nm--Well-suited for edge computing applications.
Arria-10
[57]
2013FPGA20 nm--Classified as mid-range, it is dedicated to real-time diagnostic applications based on imaging and scanning.
Stratix-10
[58]
2016FPGA14 nm--Based on the HyperFlex architecture, the platform offers superior performance over Arria-10 and Stratix-10 in terms of energy efficiency and systems integration.
Cyclone-10
[56]
2017FPGA20 nm--Based on the Quartus design environment, it provides twice the performance of Cyclone-V, making it suitable for portable medical devices.
Agilex-9
[59]
2024SoC10 nmQuad-Core ARM Cortex-A53 -The heterogeneous 3D system-in-package design and Hyperflex architecture guarantee a 45% increase in performance and a 40% reduction in power consumption compared to Stratix-10. This ensures high compute capacity, targeting edge computing applications, advanced signal processing, and neural network inference.
Table 2. Overview of ECG signal ML classification methods.
Table 2. Overview of ECG signal ML classification methods.
Ref., YearFPGA
Family
AlgorithmECG DatabaseAccuracy
[62], 2017Zynq-7000SVMMIT-BIH Arrhythmia-
[36], 2024SVMShaoxing People’s Hospital97.30%
[63], 2019SVMMIT-BIH Arrhythmia96.70%
[66], 2020SVMMIT-BIH Database98.70%
[67], 2025SVMMIT-BIH Arrhythmia-
[70], 2022ANNMIT-BIH Arrhythmia97.00%
[74], 2024Zynq UltraScale+ANNMIT-BIH Database94.08%
[64], 2021Spartan-6SVMMIT-BIH Arrhythmia95.10%
[65], 2024SVMMIT-BIH Arrhythmia96.50%
[75], 2019ANNEuropean ST-T & QT Database81.90%
[76], 2019Spartan-3MLP-PNNMIT-BIH Arrhythmia99.82%
[72], 2022Artix-7PNNMIT-BIH Database98.27%
[71], 2022ANNMIT-BIH Arrhythmia85.60%
[73], 2020ANNMIT-BIH Arrhythmia95.00%
Table 3. Overview of ECG signal DL classification methods.
Table 3. Overview of ECG signal DL classification methods.
Ref., YearFPGA
Family
DeviceAlgorithmECG DatabaseAccuracy
[81], 2024--Bi-RNNMIT-BIH Arrhythmia99.00%
[82], 2025Zynq-7000-RNN-FIRPhysioNet-
[92], 2021ZC7061D-CNNMIT-BIH Arrhythmia98.94%
[93], 2025Zybo Z7-202D-CNNMIT-BIH Arrhythmia91.90%
[94], 2022ZC7061D-CNN-98.99%
[95], 2024PYNQ-Z22D-CNNMIT-BIH Arrhythmia-
[97], 2022ZC7061D-CNNUCI
Machine
Learning Repository
99.00%
[98], 2023PYNQ-Z21D-CNNMIT-BIH Arrhythmia96.60%
[100], 2025PYNQ-Z21D-CNNMIT-BIH Arrhythmia
HVD
97.40%
99.10%
[79], 2023PYNQ-Z2MLPMIT-BIH Atrial Fibrillation
CPSC2018
93.00%
97.00%
[103], 20257Z0201D-CNNMIT-BIH Database96.55%
[99], 2019Zedboard1D-CNNMIT-BIH Arrhythmia
European ST-T
93.80%
[106], 2021ZC7061D-CNNMIT-BIH Arrhythmia99.10%
[96], 2025PYNQ-Z21D-CNNMIT-BIH Arrhythmia97.79%
[105], 2025Zynq UltraScale+ZCU1021D-CNNMIT-BIH Database99.49%
[102], 2024ZCU1042D-CNNMIT-BIH Arrhythmia98.64%
[86], 2022Artix-7XC7A200TLSTMMIT-BIH Arrhythmia98.00%
[87], 2025-LSTMMIT-BIH Arrhythmia99.00%
[90], 2020Basys 3Q-CNNMIT-BIH Atrial
Fibrillation
94.00%
[91], 2025Arty A7-351D-CNNMIT-BIH Database99.50%
[104], 2023Virtex-4XC4VLX2001D-CNNMIT-BIH Arrhythmia and PTB98.60%
[108], 2021Virtex-7XC7VX690T2D-CNNPhysioNet/Computing in Cardiology Challenge95.20%
[19], 2023Cyclone VTerasic De10-Nano1D-CNNShaoxing People’s Hospital93.24%
[107], 2023PolarFirePolarFire SoC Icicle1D-CNNMIT-BIH Arrhythmia98.60%
[101], 2021Virtex
UltraScale+
Alveo U2002D-CNNMIT-BIH Arrhythmia-
Table 4. Comparison of FPGA-based deep neural network ECG classification accelerators.
Table 4. Comparison of FPGA-based deep neural network ECG classification accelerators.
FPGA
Family
Ref., YearAccelerator TypeMethodologyFreq.
(MHz)
Numeric
Precision
Power
(W)
Latency (ms)
Zynq-7000[82], 2025RNN-FIRFIR distributed arithmetic,
Pipeline parallelism,
Approximate computing
--Reduced by 41.9%Reduced
by 33%
[92], 20211D-CNNPipelining in the PE array,
Parallel loop unrolling
20016-bit
Fixed-Point
0.79-
[93], 20252D-CNNA data-flow architecture,
Partially parallel and
sequential,
Layer partitioning,
Quantization
507-bit
Fixed-Point
--
[94], 20221D-CNNUnstructured sparsity,
Tile-first dataflow with compressed data storage,
Two-level weight index
matching,
Configurable PE array controlled by 32-bit instructions
216-bit
Fixed-Point
--
[95], 20242D-CNNEnhanced clock gating,
Local explicit clock gating,
Energy efficient MAC unit,
Bus-specific clock gating,
Memory reassignment
technique,
Ping-pong buffering for input/weights/output,
DMA-based intelligent data dispatching module
15016-bit
Fixed-Point
2.109.06
[97], 20221D-CNNFully pipelined architecture,
Tristate buffer multiplexer,
Replacement of multipliers with shift operations
442.9416-bit
Fixed-Point
0.17-
[98], 20231D-CNNTensor-tensor multiplication,
Weight stationary,
Inner product matrix-vector multiplication (IPMVM),
Input stationary,
Parallel shift processing element array arrangement (PSPEAA),
Quantization,
Pruning
108-bit
Fixed-Point
0.130.23
[100], 20251D-CNNQuantization,
Application-specific instruction set processor (ASIP) to facilitate efficient cross-layer and cross-model computations,
Systolic array,
Matrix mapping unit (MMU),
Pipeline state register (PSR)
18-bit
Fixed-Point
0.107.2
[79], 2023MLPParallel Optimization,
Pipeline or unroll strategies, and combinations
100
666.70
FxP <8,4>0.01
1.52
2   × 10 4
2.6   × 10 4
[103], 20251D-CNNResidual connections,
Depthwise separable conv,
Unstructured pruning,
Incremental network
quantization
50-1.78 63
[99], 20191D-CNNParallel CoNN -16-bit
Fixed-Point
--
[106], 20211D-CNNGlobal average pooling,
Fully pipelined processing unit,
Dynamic activation
20016-bit
Fixed-Point
--
[96], 20251D-CNNParallel-quantized-pixel wise convolution module,
Pooling fusion,
Skip-zero-weight architecture
-4-bit UINT-236
Zynq
UltraScale+
[105], 20251D-CNNTiny InceptionNet Accelerator,
Dual processing element array,
Shared pixel distributor,
Shared pixel memory
25016-bit
Fixed-Point
-0.017
[102], 20242D-CNNChannel-wise addition,
Array partitioning and
pipelining,
INT8 quantization
-8-bit INT4.17219
Artix-7[86], 2022LSTMPipelined stochastic adaptive distributed architectures
(P-SCADA),
Distributed arithmetic stochastic computing (DSC),
Pipelining
450FP32- 56   × 10 6
248 × 10 6
[87], 2025LSTMLSTM architecture using finite state machine,
Multiplierless lattice-based DWT architecture,
Hard-sigmoid and hard-tanh functions
543.12 signed Fixed-Point0.04under 100
[90], 2020Q-CNNQuantization,
SIMD-based vector units,
Loop unrolling strategy,
Hard-limit function
3.46   × 10 2 22-bit
Fixed-Point
-1.35
[91], 20251D-CNNQuantization-aware training,
Streaming architecture,
Throughput balancing,
Cosine-approximated CWT
-16-bit Floating-
Point
0.2 0.35
Virtex-4[104], 20231D-CNNPipeline parallelism,
Bayesian optimization
algorithm
185.42Floating-
Point
4.5   × 10 4 5.3 × 10 6
Virtex-7[108], 20212D-CNNPre-fetch mechanism,
Partial sum buffers,
Quantization
3.24   × 10 1 10–17 bit
Fixed-Point
0.01-
Cyclone V[19], 20231D-CNNFully mapped design,
Pipelining,
Quantization
508-bit INT0.060.06
PolarFire[107], 20231D-CNNDWT feature extraction
method,
Pipeline parallelism
175.40- 5   × 10 4 -
Virtex
UltraScale+
[101], 20212D-CNNHardware-level pipelining 100--572
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mihăilă, L.-I.; Barbura, C.-G.; Faragó, P.; Hintea, S.; Kirei, B.S.; Fazakas, A. FPGA-Accelerated ECG Analysis: Narrative Review of Signal Processing, ML/DL Models, and Design Optimizations. Electronics 2026, 15, 301. https://doi.org/10.3390/electronics15020301

AMA Style

Mihăilă L-I, Barbura C-G, Faragó P, Hintea S, Kirei BS, Fazakas A. FPGA-Accelerated ECG Analysis: Narrative Review of Signal Processing, ML/DL Models, and Design Optimizations. Electronics. 2026; 15(2):301. https://doi.org/10.3390/electronics15020301

Chicago/Turabian Style

Mihăilă, Laura-Ioana, Claudia-Georgiana Barbura, Paul Faragó, Sorin Hintea, Botond Sandor Kirei, and Albert Fazakas. 2026. "FPGA-Accelerated ECG Analysis: Narrative Review of Signal Processing, ML/DL Models, and Design Optimizations" Electronics 15, no. 2: 301. https://doi.org/10.3390/electronics15020301

APA Style

Mihăilă, L.-I., Barbura, C.-G., Faragó, P., Hintea, S., Kirei, B. S., & Fazakas, A. (2026). FPGA-Accelerated ECG Analysis: Narrative Review of Signal Processing, ML/DL Models, and Design Optimizations. Electronics, 15(2), 301. https://doi.org/10.3390/electronics15020301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop