Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (122)

Search Parameters:
Keywords = FPGA computing core

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1202 KB  
Article
Adaptive ORB Accelerator on FPGA: High Throughput, Power Consumption, and More Efficient Vision for UAVs
by Hussam Rostum and József Vásárhelyi
Signals 2026, 7(1), 13; https://doi.org/10.3390/signals7010013 - 2 Feb 2026
Viewed by 42
Abstract
Feature extraction and description are fundamental components of visual perception systems used in applications such as visual odometry, Simultaneous Localization and Mapping (SLAM), and autonomous navigation. In resource-constrained platforms, such as Unmanned Aerial Vehicles (UAVs), achieving real-time hardware acceleration on Field-Programmable Gate Arrays [...] Read more.
Feature extraction and description are fundamental components of visual perception systems used in applications such as visual odometry, Simultaneous Localization and Mapping (SLAM), and autonomous navigation. In resource-constrained platforms, such as Unmanned Aerial Vehicles (UAVs), achieving real-time hardware acceleration on Field-Programmable Gate Arrays (FPGAs) is challenging. This work demonstrates an FPGA-based implementation of an adaptive ORB (Oriented FAST and Rotated BRIEF) feature extraction pipeline designed for high-throughput and energy-efficient embedded vision. The proposed architecture is a completely new design for the main algorithmic blocks of ORB, including the FAST (Features from Accelerated Segment Test) feature detector, Gaussian image filtering, moment computation, and descriptor generation. Adaptive mechanisms are introduced to dynamically adjust thresholds and filtering behavior, improving robustness under varying illumination conditions. The design is developed using a High-Level Synthesis (HLS) approach, where all processing modules are implemented as reusable hardware IP cores and integrated at the system level. The architecture is deployed and evaluated on two FPGA platforms, PYNQ-Z2 and KRIA KR260, and its performance is compared against CPU and GPU implementations using a dedicated C++ testbench based on OpenCV. Experimental results demonstrate significant improvements in throughput and energy efficiency while maintaining stable and scalable performance, making the proposed solution suitable for real-time embedded vision applications on UAVs and similar platforms. Notably, the FPGA implementation increases DSP utilization from 11% to 29% compared to the previous designs implemented by other researchers, effectively offloading computational tasks from general purpose logic (LUTs and FFs), reducing LUT usage by 6% and FF usage by 13%, while maintaining overall design stability, scalability, and acceptable thermal margins at 2.387 W. This work establishes a robust foundation for integrating the optimized ORB pipeline into larger drone systems and opens the door for future system-level enhancements. Full article
Show Figures

Figure 1

20 pages, 9489 KB  
Article
Design and Implementation of a High-Speed Storage System Based on SATA Interface
by Junwei Lu, Jie Bai and Sanmin Shen
Electronics 2026, 15(2), 452; https://doi.org/10.3390/electronics15020452 - 20 Jan 2026
Viewed by 1741
Abstract
In flight tests, to meet the requirements of consistent acquisition and storage of multiple targets, multiple systems, and multiple data types, various data types are processed into Pulse Code Modulation (PCM) data streams using PCM encoding for storage. Aiming at the requirement of [...] Read more.
In flight tests, to meet the requirements of consistent acquisition and storage of multiple targets, multiple systems, and multiple data types, various data types are processed into Pulse Code Modulation (PCM) data streams using PCM encoding for storage. Aiming at the requirement of real-time storage of high-bit-rate PCM data streams, a large-capacity storage system based on Serial Advanced Technology Attachment 3.0 (SATA3.0) is designed. The system uses the Kintex 7 series Field-Programmable Gate Array (FPGA) as the control core, receives PCM data streams through the Low-Voltage Differential Signaling (LVDS) low-voltage differential interface, stores the received PCM data streams into the mSATA disk via the SATA3.0 transmission bus, and transmits the stored data back to the host computer through the USB3.0 interface for analysis. Meanwhile, to solve the problem of complex data export, the storage system constructs a FAT32 file system through the MicroBlaze soft core to optimize the management and operation of the large-capacity storage system. Test results show that the storage system can perform stable high-rate storage at −40 °C~80 °C. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

32 pages, 8110 KB  
Article
A Secure and Efficient Sharing Framework for Student Electronic Academic Records: Integrating Zero-Knowledge Proof and Proxy Re-Encryption
by Xin Li, Minsheng Tan and Wenlong Tian
Future Internet 2026, 18(1), 47; https://doi.org/10.3390/fi18010047 - 12 Jan 2026
Viewed by 196
Abstract
A sharing framework based on Zero-Knowledge Proof (ZKP) and Proxy Re-encryption (PRE) technologies offers a promising solution for sharing Student Electronic Academic Records (SEARs). As core credentials in the education sector, student records are characterized by strong identity binding, the need for long-term [...] Read more.
A sharing framework based on Zero-Knowledge Proof (ZKP) and Proxy Re-encryption (PRE) technologies offers a promising solution for sharing Student Electronic Academic Records (SEARs). As core credentials in the education sector, student records are characterized by strong identity binding, the need for long-term retention, frequent cross-institutional verification, and sensitive information. Compared with electronic health records and government archives, they face more complex security, privacy protection, and storage scalability challenges during sharing. These records not only contain sensitive data such as personal identity and academic performance but also serve as crucial evidence in key scenarios such as further education, employment, and professional title evaluation. Leakage or tampering could have irreversible impacts on a student’s career development. Furthermore, traditional blockchain technology faces storage capacity limitations when storing massive academic records, and existing general electronic record sharing solutions struggle to meet the high-frequency verification demands of educational authorities, universities, and employers for academic data. This study proposes a dedicated sharing framework for students’ electronic academic records, leveraging PRE technology and the distributed ledger characteristics of blockchain to ensure transparency and immutability during sharing. By integrating the InterPlanetary File System (IPFS) with Ethereum Smart Contract (SC), it addresses blockchain storage bottlenecks, enabling secure storage and efficient sharing of academic records. Relying on optimized ZKP technology, it supports verifying the authenticity and integrity of records without revealing sensitive content. Furthermore, the introduction of gate circuit merging, constant folding techniques, Field-Programmable Gate Array (FPGA) hardware acceleration, and the efficient Bulletproofs algorithm alleviates the high computational complexity of ZKP, significantly reducing proof generation time. The experimental results demonstrate that the framework, while ensuring strong privacy protection, can meet the cross-scenario sharing needs of student records and significantly improve sharing efficiency and security. Therefore, this method exhibits superior security and performance in privacy-preserving scenarios. This framework can be applied to scenarios such as cross-institutional academic certification, employer background checks, and long-term management of academic records by educational authorities, providing secure and efficient technical support for the sharing of electronic academic credentials in the digital education ecosystem. Full article
Show Figures

Graphical abstract

17 pages, 9165 KB  
Article
An FPGA-Based Reconfigurable Accelerator for Real-Time Affine Transformation in Industrial Imaging Heterogeneous SoC
by Yang Zhang, Dejun Chen, Huixiong Ruan, Hongyu Jia, Yong Liu and Ying Luo
Sensors 2026, 26(1), 316; https://doi.org/10.3390/s26010316 - 3 Jan 2026
Viewed by 434
Abstract
Real-time affine transformation, a core operation for image correction and registration of industrial cameras and scanners, faces challenges including the high computational cost of interpolation and inefficient data access. In this study, we propose a reconfigurable accelerator architecture based on a heterogeneous system-on-chip [...] Read more.
Real-time affine transformation, a core operation for image correction and registration of industrial cameras and scanners, faces challenges including the high computational cost of interpolation and inefficient data access. In this study, we propose a reconfigurable accelerator architecture based on a heterogeneous system-on-chip (SoC). The architecture decouples tasks into control and data paths: the ARM core in the processing system (PS) handles parameter matrix generation and scheduling, whereas the FPGA-based acceleration module in programmable logic (PL) implements the proposed PATRM algorithm. By integrating multiplication-free design and affine matrix properties, PATRM adopts Q15.16 fixed-point computation and AXI4 burst transmission for efficient block data prefetching and pipelined processing. Experimental results demonstrate 25 frames per second (FPS) for 2095×2448 resolution images, representing a 128.21 M pixel/s throughput, which is 5.3× faster than the Block AT baseline with a peak signal-to-noise ratio (PSNR) exceeding 26 dB. Featuring low resource consumption and dynamic reconfigurability, the accelerator meets the real-time requirements of industrial scanner correction and other high-performance image processing tasks. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 9727 KB  
Article
An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI
by Su-Hwan Na and Dong-Sun Kim
Electronics 2025, 14(24), 4959; https://doi.org/10.3390/electronics14244959 - 17 Dec 2025
Viewed by 447
Abstract
Neuromorphic computing has emerged as a promising paradigm for edge AI systems owing to its event-driven operation and high energy efficiency. However, conventional spiking neural network (SNN) architectures often suffer from redundant computation and inefficient power control, particularly during on-chip learning. This paper [...] Read more.
Neuromorphic computing has emerged as a promising paradigm for edge AI systems owing to its event-driven operation and high energy efficiency. However, conventional spiking neural network (SNN) architectures often suffer from redundant computation and inefficient power control, particularly during on-chip learning. This paper proposes a network-on-chip (NoC) architecture featuring a unified refractory-enabled neuron (UREN)-based router that globally coordinates spike-driven computation across multiple neuron cores. The router applies a unified refractory time to all neurons following a winner spike event, effectively enabling clock gating and suppressing redundant activity. The proposed design adopts a star-routing topology with multicasting support and integrates nearest-neighbor spike-timing-dependent plasticity (STDP) for local online learning. FPGA-based experiments demonstrate a 30% reduction in computation and 86.1% online classification accuracy on the MNIST dataset compared with baseline SNN implementations. These results confirm that the UREN-based router provides a scalable and power-efficient neuromorphic processor architecture, well suited for energy-constrained edge AI applications. Full article
Show Figures

Figure 1

13 pages, 835 KB  
Article
Layer-Pipelined CNN Accelerator Design on 2.5D FPGAs
by Mengxuan Wang and Chang Wu
Electronics 2025, 14(23), 4587; https://doi.org/10.3390/electronics14234587 - 23 Nov 2025
Viewed by 542
Abstract
With the rapid advancement of 2.5D FPGA technology, the integration of multiple FPGA dies enables larger design capacity and higher computing power. This progress provides a high-speed hardware platform well-suited for neural network acceleration. In this paper, we present a high-performance accelerator design [...] Read more.
With the rapid advancement of 2.5D FPGA technology, the integration of multiple FPGA dies enables larger design capacity and higher computing power. This progress provides a high-speed hardware platform well-suited for neural network acceleration. In this paper, we present a high-performance accelerator design for large-scale neural networks on 2.5D FPGAs. First, we propose a layer pipeline architecture that utilizes multiple accelerator cores, each equipped with individual high-bandwidth DDR memory. To address inter-die data dependencies, we introduce a block convolution mechanism that enables independent and efficient computation across dies. Furthermore, we propose a design space exploration scheme to optimize computational efficiency under resource constraints. Experimental results demonstrate that our proposed accelerator achieves 4860.87 GOPS when running VGG-16 on the Alveo U250 board, significantly outperforming existing layer pipeline designs on the same platform. Full article
(This article belongs to the Special Issue Advances in High-Performance and Parallel Computing)
Show Figures

Figure 1

22 pages, 971 KB  
Article
Emulation-Based Analysis of Multiple Cell Upsets in LEON3 SDRAM: A Workload-Dependent Vulnerability Study
by Afef Kchaou, Sehmi Saad and Hatem Garrab
Electronics 2025, 14(23), 4582; https://doi.org/10.3390/electronics14234582 - 23 Nov 2025
Cited by 1 | Viewed by 370
Abstract
The reliability of embedded processors in safety- and mission-critical domains is increasingly threatened by radiation-induced soft errors, particularly multiple-cell upsets (MCUs) that simultaneously corrupt adjacent cells in external SDRAM. While prior studies on the LEON3 processor have largely focused on single-event upsets (SEUs) [...] Read more.
The reliability of embedded processors in safety- and mission-critical domains is increasingly threatened by radiation-induced soft errors, particularly multiple-cell upsets (MCUs) that simultaneously corrupt adjacent cells in external SDRAM. While prior studies on the LEON3 processor have largely focused on single-event upsets (SEUs) in internal SRAM structures, they overlook MCU effects in off-chip SDRAM, a critical gap that limits fault coverage and compromises system-level reliability assessment in modern high-density embedded systems. This paper presents an SDRAM-based fault injection framework using FPGA emulation to evaluate the impact of MCUs on the LEON3 soft-core processor, with faults directly injected into the external memory subsystem where data corruptions can rapidly propagate into system-level failures. The methodology injects spatially correlated two-bit MCUs directly into SDRAM during realistic workload execution. Three architecturally diverse benchmarks were analyzed, each representing a distinct computational workload: a numerical (matrix multiplication), signal-processing (FFT), and a cryptographic (AES-128 encryption) application, chosen to capture arithmetic-intensive, iterative, and control-intensive execution profiles, respectively. The results reveal a distinct workload-dependent vulnerability profile. Matrix multiplication exhibited >99.99% fault activation, with outcomes overwhelmingly dominated by data store errors. FFT showed >97% activation in steady-state execution, following an initial phase sensitive to alignment and data access exceptions. AES displayed 88.12% non-propagating faults, primarily due to injections in inactive memory regions, but remained exposed to critical memory access violations and control-flow exceptions that enable fault-based cryptanalysis. These findings demonstrate that SEU-only models severely underestimate real-world MCU risks and underscore the necessity of selective, workload-aware fault-tolerance strategies: lightweight ECC for cryptographic data structures, alignment monitoring for signal processing, and algorithm-based fault tolerance (ABFT) for numerical kernels. This work provides actionable insights for hardening LEON3-based systems against emerging multi-bit threats in radiation-rich and adversarial environments. Full article
Show Figures

Figure 1

19 pages, 2241 KB  
Article
Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors
by Zhiwei Jin, Tingpeng Hu, Zhiyi Jie and Peng Wang
Appl. Sci. 2025, 15(21), 11634; https://doi.org/10.3390/app152111634 - 31 Oct 2025
Viewed by 1355
Abstract
The widespread adoption of fifth-generation Reduced Instruction Set Computing (RISC-V) processors in embedded systems has driven advancements in domestic processor design. However, research on processor performance optimization methods predominantly focuses on two- to three-stage pipeline architectures, with relatively few studies addressing complex five-stage [...] Read more.
The widespread adoption of fifth-generation Reduced Instruction Set Computing (RISC-V) processors in embedded systems has driven advancements in domestic processor design. However, research on processor performance optimization methods predominantly focuses on two- to three-stage pipeline architectures, with relatively few studies addressing complex five-stage pipeline processors. This study addresses this gap by analyzing optimization strategies for a five-stage pipeline processor architecture. Key areas examined include RISC-V jump instruction branch prediction (speed optimization), memory structure (memory access and resource optimization), and data-correlation-based division operations (fetch optimization). The processor core underwent CoreMark benchmark testing via a Field Programmable Gate Array (FPGA), analyzing the impact of optimizations such as branch prediction and cache on processor performance. The final processor achieved a CoreMark score of 2.92 CoreMark/MHz, outperforming most open-source processors and validating the effectiveness of the optimization strategies. Full article
Show Figures

Figure 1

23 pages, 3153 KB  
Article
Domain-Specific Acceleration of Gravity Forward Modeling via Hardware–Software Co-Design
by Yong Yang, Daying Sun, Zhiyuan Ma and Wenhua Gu
Micromachines 2025, 16(11), 1215; https://doi.org/10.3390/mi16111215 - 25 Oct 2025
Viewed by 1056
Abstract
The gravity forward modeling algorithm is a compute-intensive method and is widely used in scientific computing, particularly in geophysics, to predict the impact of subsurface structures on surface gravity fields. Traditional implementations rely on CPUs, where performance gains are mainly achieved through algorithmic [...] Read more.
The gravity forward modeling algorithm is a compute-intensive method and is widely used in scientific computing, particularly in geophysics, to predict the impact of subsurface structures on surface gravity fields. Traditional implementations rely on CPUs, where performance gains are mainly achieved through algorithmic optimization. With the rise of domain-specific architectures, FPGA offers a promising platform for acceleration, but faces challenges such as limited programmability and the high cost of nonlinear function implementation. This work proposes an FPGA-based co-processor to accelerate gravity forward modeling. A RISC-V core is integrated with a custom instruction set targeting key computation steps. Tasks are dynamically scheduled and executed on eight fully pipeline processing units, achieving high parallelism while retaining programmability. To address nonlinear operations, we introduce a piecewise linear approximation method optimized via stochastic gradient descent (SGD), significantly reducing resource usage and latency. The design is implemented on the AMD UltraScale+ ZCU102 FPGA (Advanced Micro Devices, Inc. (AMD), Santa Clara, CA, USA) and evaluated across several forward modeling scenarios. At 250 MHz, the system achieves up to 179× speedup over an Intel Xeon 5218R CPU (Intel Corporation, Santa Clara, CA, USA) and improves energy efficiency by 2040×. To the best of our knowledge, this is the first FPGA-based gravity forward modeling accelerate design. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

21 pages, 6094 KB  
Article
Nanopore-Aware Embedded Detection for Mobile DNA Sequencing: A Viterbi–HMM Design Versus Deep Learning Approaches
by Karim Hammad, Zhongpan Wu, Ebrahim Ghafar-Zadeh and Sebastian Magierowski
Biosensors 2025, 15(9), 569; https://doi.org/10.3390/bios15090569 - 1 Sep 2025
Viewed by 1231
Abstract
Nanopore-based DNA sequencing has emerged as a transformative biosensing technology, enabling real-time molecular diagnostics in compact and mobile form factors. However, the computational complexity of the basecalling process—the step that translates raw nanopore signals into nucleotide sequences—poses a critical energy challenge for mobile [...] Read more.
Nanopore-based DNA sequencing has emerged as a transformative biosensing technology, enabling real-time molecular diagnostics in compact and mobile form factors. However, the computational complexity of the basecalling process—the step that translates raw nanopore signals into nucleotide sequences—poses a critical energy challenge for mobile deployment. While deep learning (DL) models currently dominate this task due to their high accuracy, they demand substantial power budgets and computing resources, making them unsuitable for portable or field-scale biosensor platforms. In this work, we propose an embedded hardware–software framework for DNA sequence detection that leverages a Viterbi-based Hidden Markov Model (HMM) implemented on a custom 64-bit RISC-V core. The proposed HMM detector is realized on an off-the-shelf Virtex-7 FPGA and evaluated against state-of-the-art DL-based basecallers in terms of energy efficiency and inference accuracy. From one side, the experimental results show that our system achieves an energy efficiency improvement of 6.5×, 5.5×, and 4.6×, respectively, compared to similar HMM-based detectors implemented on a commodity x86 processor, Cortex-A9 ARM embedded system, and a previously published Rocket-based system. From another side, the proposed detector demonstrates 15× and 2.4× energy efficiency superiority over state-of-the-art DL-based detectors, with competitive accuracy and sufficient throughput for field-based genomic surveillance applications and point-of-care diagnostics. This study highlights the practical advantages of classical probabilistic algorithms when tightly integrated with lightweight embedded processors for biosensing applications constrained by energy, size, and latency. Full article
Show Figures

Figure 1

21 pages, 736 KB  
Article
RiscADA: RISC-V Extension for Optimized Control of External D/A and A/D Converters
by Cosmin-Andrei Popovici, Andrei Stan, Nicolae-Alexandru Botezatu and Vasile-Ion Manta
Electronics 2025, 14(15), 3152; https://doi.org/10.3390/electronics14153152 - 7 Aug 2025
Viewed by 1506
Abstract
The increasing interest shared by academia and industry in the development of RISC-V cores, extensions and accelerators becomes fructified by collaborative efforts, like the EU’s ChipsJU, which leverages the design of building blocks, IPs and cores based on RISC-V architecture. A domain capable [...] Read more.
The increasing interest shared by academia and industry in the development of RISC-V cores, extensions and accelerators becomes fructified by collaborative efforts, like the EU’s ChipsJU, which leverages the design of building blocks, IPs and cores based on RISC-V architecture. A domain capable of benefiting from the RISC-V extensibility is the control of external DACs and ADCs. The proposed solution is an open-source RISC-V extension for optimized control of external DACs and ADCs called RiscADA. The extension supports a parametrizable number of DACs and ADCs, is integrated as a coprocessor beside CVA6 in a SoC by using the CV-X-IF interface, deployed on a Kintex UltraScale+ FPGA and implements ISA extension instructions. After benchmarks with commercial solutions, the results show that CVA6 using RiscADA extension configures external DACs 38.6× and 10.9× times faster than MicroBlaze V and simple CVA6, both using AXI SPI peripherals. The proposed extension achieves 5.35× and 3.05× times higher sample rates of external ADCs than the two configurations mentioned above. RiscADA extension performs digital signal conditioning 4.52× and 3.1× times faster than the MicroBlaze V and CVA6, both using AXI SPI peripherals. It computes statistics for external ADC readings (minimum, maximum, simple-moving average and over-threshold duration). Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

19 pages, 4142 KB  
Article
Onboard Real-Time Hyperspectral Image Processing System Design for Unmanned Aerial Vehicles
by Ruifan Yang, Min Huang, Wenhao Zhao, Zixuan Zhang, Yan Sun, Lulu Qian and Zhanchao Wang
Sensors 2025, 25(15), 4822; https://doi.org/10.3390/s25154822 - 5 Aug 2025
Viewed by 2525
Abstract
This study proposes and implements a dual-processor FPGA-ARM architecture to resolve the critical contradiction between massive data volumes and real-time processing demands in UAV-borne hyperspectral imaging. The integrated system incorporates a shortwave infrared hyperspectral camera, IMU, control module, heterogeneous computing core, and SATA [...] Read more.
This study proposes and implements a dual-processor FPGA-ARM architecture to resolve the critical contradiction between massive data volumes and real-time processing demands in UAV-borne hyperspectral imaging. The integrated system incorporates a shortwave infrared hyperspectral camera, IMU, control module, heterogeneous computing core, and SATA SSD storage. Through hardware-level task partitioning—utilizing FPGA for high-speed data buffering and ARM for core computational processing—it achieves a real-time end-to-end acquisition–storage–processing–display pipeline. The compact integrated device exhibits a total weight of merely 6 kg and power consumption of 40 W, suitable for airborne platforms. Experimental validation confirms the system’s capability to store over 200 frames per second (at 640 × 270 resolution, matching the camera’s maximum frame rate), quick-look imaging capability, and demonstrated real-time processing efficacy via relative radio-metric correction tasks (processing 5000 image frames within 1000 ms). This framework provides an effective technical solution to address hyperspectral data processing bottlenecks more efficiently on UAV platforms for dynamic scenario applications. Future work includes actual flight deployment to verify performance in operational environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

30 pages, 15434 KB  
Article
A DSP–FPGA Heterogeneous Accelerator for On-Board Pose Estimation of Non-Cooperative Targets
by Qiuyu Song, Kai Liu, Shangrong Li, Mengyuan Wang and Junyi Wang
Aerospace 2025, 12(7), 641; https://doi.org/10.3390/aerospace12070641 - 19 Jul 2025
Viewed by 1618
Abstract
The increasing presence of non-cooperative targets poses significant challenges to the space environment and threatens the sustainability of aerospace operations. Accurate on-orbit perception of such targets, particularly those without cooperative markers, requires advanced algorithms and efficient system architectures. This study presents a hardware–software [...] Read more.
The increasing presence of non-cooperative targets poses significant challenges to the space environment and threatens the sustainability of aerospace operations. Accurate on-orbit perception of such targets, particularly those without cooperative markers, requires advanced algorithms and efficient system architectures. This study presents a hardware–software co-design framework for the pose estimation of non-cooperative targets. Firstly, a two-stage architecture is proposed, comprising object detection and pose estimation. YOLOv5s is modified with a Focus module to enhance feature extraction, and URSONet adopts global average pooling to reduce the computational burden. Optimization techniques, including batch normalization fusion, ReLU integration, and linear quantization, are applied to improve inference efficiency. Secondly, a customized FPGA-based accelerator is developed with an instruction scheduler, memory slicing mechanism, and computation array. Instruction-level control supports model generalization, while a weight concatenation strategy improves resource utilization during convolution. Finally, a heterogeneous DSP–FPGA system is implemented, where the DSP manages data pre-processing and result integration, and the FPGA performs core inference. The system is deployed on a Xilinx X7K325T FPGA operating at 200 MHz. Experimental results show that the optimized model achieves a peak throughput of 399.16 GOP/s with less than 1% accuracy loss. The proposed design reaches 0.461 and 0.447 GOP/s/DSP48E1 for two model variants, achieving a 2× to 3× improvement over comparable designs. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

35 pages, 2630 KB  
Article
AHA: Design and Evaluation of Compute-Intensive Hardware Accelerators for AMD-Xilinx Zynq SoCs Using HLS IP Flow
by David Berrazueta-Mena and Byron Navas
Computers 2025, 14(5), 189; https://doi.org/10.3390/computers14050189 - 13 May 2025
Cited by 2 | Viewed by 3537
Abstract
The increasing complexity of algorithms in embedded applications has amplified the demand for high-performance computing. Heterogeneous embedded systems, particularly FPGA-based systems-on-chip (SoCs), enhance execution speed by integrating hardware accelerator intellectual property (IP) cores. However, traditional low-level IP-core design presents significant challenges. High-level synthesis [...] Read more.
The increasing complexity of algorithms in embedded applications has amplified the demand for high-performance computing. Heterogeneous embedded systems, particularly FPGA-based systems-on-chip (SoCs), enhance execution speed by integrating hardware accelerator intellectual property (IP) cores. However, traditional low-level IP-core design presents significant challenges. High-level synthesis (HLS) offers a promising alternative, enabling efficient FPGA development through high-level programming languages. Yet, effective methodologies for designing and evaluating heterogeneous FPGA-based SoCs remain crucial. This study surveys HLS tools and design concepts and presents the development of the AHA IP cores, a set of five benchmarking accelerators for rapid Zynq-based SoC evaluation. These accelerators target compute-intensive tasks, including matrix multiplication, Fast Fourier Transform (FFT), Advanced Encryption Standard (AES), Back-Propagation Neural Network (BPNN), and Artificial Neural Network (ANN). We establish a streamlined design flow using AMD-Xilinx tools for rapid prototyping and testing FPGA-based heterogeneous platforms. We outline criteria for selecting algorithms to improve speed and resource efficiency in HLS design. Our performance evaluation across various configurations highlights performance–resource trade-offs and demonstrates that ANN and BPNN achieve significant parallelism, while AES optimization increases resource utilization the most. Matrix multiplication shows strong optimization potential, whereas FFT is constrained by data dependencies. Full article
Show Figures

Figure 1

12 pages, 345 KB  
Article
NeuroAdaptiveNet: A Reconfigurable FPGA-Based Neural Network System with Dynamic Model Selection
by Achraf El Bouazzaoui, Omar Mouhib and Abdelkader Hadjoudja
Chips 2025, 4(2), 24; https://doi.org/10.3390/chips4020024 - 8 May 2025
Cited by 2 | Viewed by 1541
Abstract
This paper presents NeuroAdaptiveNet, an FPGA-based neural network framework that dynamically self-adjusts its architectural configurations in real time to maximize performance across diverse datasets. The core innovation is a Dynamic Classifier Selection mechanism, which harnesses the k-Nearest Centroid algorithm to identify the most [...] Read more.
This paper presents NeuroAdaptiveNet, an FPGA-based neural network framework that dynamically self-adjusts its architectural configurations in real time to maximize performance across diverse datasets. The core innovation is a Dynamic Classifier Selection mechanism, which harnesses the k-Nearest Centroid algorithm to identify the most competent neural network model for each incoming data sample. By adaptively selecting the most suitable model configuration, NeuroAdaptiveNet achieves significantly improved classification accuracy and optimized resource usage compared to conventional, statically configured neural networks. Experimental results on four datasets demonstrate that NeuroAdaptiveNet can reduce FPGA resource utilization by as much as 52.85%, increase classification accuracy by 4.31%, and lower power consumption by up to 24.5%. These gains illustrate the clear advantage of real-time, per-input reconfiguration over static designs. These advantages are particularly crucial for edge computing and embedded applications, where computational constraints and energy efficiency are paramount. The ability of NeuroAdaptiveNet to tailor its neural network parameters and architecture on a per-input basis paves the way for more efficient and accurate AI solutions in resource-constrained environments. Full article
Show Figures

Figure 1

Back to TopTop