Advances in Field-Programmable Gate Arrays (FPGAs)

A special issue of Micromachines (ISSN 2072-666X). This special issue belongs to the section "E:Engineering and Technology".

Deadline for manuscript submissions: 30 November 2026 | Viewed by 6611

Special Issue Editors


E-Mail Website
Guest Editor
School of Electronic Science and Engineering, Southeast University, Nanjing 210096, China
Interests: field-programmable gate array (FPGA); quantum computing; AI acceleration

E-Mail Website
Guest Editor
MOTCE Laboratory, Department of Computer Engineering, Polytechnique Montréal, Montréal, QC H3T 1J4, Canada
Interests: field-programmable gate arrays (FPGAs); computer architecture; embedded systems

E-Mail Website
Guest Editor
Electrical and Electronic Engineering, University of Southampton, Southampton, UK
Interests: field-programmable gate arrays (FPGAs); system interconnects and network-on-chips (NoC); big data analytics and sorting accelerators; many-core computer architecture

Special Issue Information

Dear Colleagues,

Field-programmable gate arrays (FPGAs) have successfully transitioned from mostly prototyping platforms to heterogeneous compute-acceleration platforms. Currently, FPGAs have been widely used in artificial intelligence (AI) acceleration, sensor signal acquisition and processing, as well as quantum information processing. With the rapid advances in AI, quantum computing, and micro-nano sensor systems, FPGAs continue to be an attractive computing platform for domain-specific accelerations. In this Special Issue, we call for high-quality and insightful manuscripts on advanced FPGA circuits and systems designs, modern FPGA architectures, high-level synthesis tools and FPGA-based applications, such as AI or LLM acceleration, quantum computing and quantum information, scientific computing, integration of sensor arrays, etc. The objective of this Special Issue is to solicit and present the latest research findings in the field of hardware and algorithm codesign with a particular interest in FPGA circuits and systems.

We look forward to receiving your submissions!

Prof. Dr. He Li
Dr. Tarek Ould-Bachir
Dr. Philippos Papaphilippou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Micromachines is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2100 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • FPGA architecture
  • FPGA circuits and systems design
  • FPGA-based signal processing with micro- and nanodevices
  • emerging applications: AI or LLM acceleration, neuromorphic emulation, quantum computing and quantum information, scientific computing, etc.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

26 pages, 4064 KB  
Article
Topology Reconfiguration for NoCs: A Fast Reconfiguration Algorithm Based on Monotonic Path Shifting
by Mingzhi Zhang, Zhijia Wang, Zhenxing Wang, Dali Xu and Na Niu
Micromachines 2026, 17(4), 438; https://doi.org/10.3390/mi17040438 - 31 Mar 2026
Viewed by 454
Abstract
With the advancement of semiconductor technology, the Network-on-Chip (NoC) has become a critical architecture for communication between multiple cores. However, failures caused by factors such as manufacturing processes can degrade its performance and stability, making efficient topology reconstruction algorithms particularly important. Conventional 2D [...] Read more.
With the advancement of semiconductor technology, the Network-on-Chip (NoC) has become a critical architecture for communication between multiple cores. However, failures caused by factors such as manufacturing processes can degrade its performance and stability, making efficient topology reconstruction algorithms particularly important. Conventional 2D mesh reconstruction yields irregular topologies, increasing network latency and complicating system scheduling and deployment. While REmesh structures maintain topological regularity, they struggle to balance algorithmic complexity, success rates, and reconstruction costs. This paper proposes a monotonic path shift (MPS) topological reconstruction algorithm for REmesh NoCs with core-level redundancy, based on local rapid recovery. This algorithm localizes reconstruction decisions by establishing monotonic paths between failed cores and redundant cores for recovery. It incorporates region retention and local fallback mechanisms to suppress path conflicts among multiple failed cores. Theoretical analysis shows that MPS provides an upper bound on the runtime of the algorithm, significantly reducing its time complexity. Experimental results indicate that its reconstruction success rate is comparable to that of the ACTR algorithm, with both maintaining a high repair rate even under high fault density. In terms of core reuse rate, MPS achieves significant improvements over BTTR, BSTR, and ACTR, with an average increase of approximately 10% under low-fault conditions, effectively utilizing remaining computational resources. Concurrently, the algorithm substantially reduces average migration time, accelerating recovery by several orders of magnitude in large-scale low-fault scenarios and markedly lowering online recovery overhead. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

21 pages, 1332 KB  
Article
Impact of Fabrication Defects on FPGA Logic Using Memristor-Based Memory Cells
by Jonas Schoenen, Jonas Gehrunger, Leon Mayrhofer, Timo Oster, Eszter Piros, Taewook Kim, Alexey Arzumanov, Enrique Miranda, Klaus Hofmann, Lambert Alff and Christian Hochberger
Micromachines 2026, 17(4), 429; https://doi.org/10.3390/mi17040429 - 31 Mar 2026
Viewed by 434
Abstract
Memristor-based configuration memory offers an alternative solution to the volatility and large area overhead of conventional Static Random Access Memory (SRAM)-based FPGA configuration memory. Their non-volatile nature and the possibility of stacking them on top of the logic layer in a process called [...] Read more.
Memristor-based configuration memory offers an alternative solution to the volatility and large area overhead of conventional Static Random Access Memory (SRAM)-based FPGA configuration memory. Their non-volatile nature and the possibility of stacking them on top of the logic layer in a process called Back-End-Of-Line (BEOL) manufacturing help not only dramatically reduce area consumption but also significantly reduce startup time. However, due to the comparatively high defect probability caused by manufacturing defects, traditional approaches for defect tolerance are not fit to address these defects. This work introduces an approach to defect-aware and tolerant synthesis. Based on this, an investigation into the defect tolerance of different architecture choices regarding the size of LUTs and the fracturability of LUTs is presented. We can show that smaller, non-fracturable LUTs exhibit a higher defect tolerance. Moreover, multiple strategies to improve the mapping result based on the properties of the logic functions are introduced. Notably, reducing the mapping complexity of logic clusters during the packing stage significantly improves the mapping success rate. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

24 pages, 1630 KB  
Article
Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference
by Yiwen Kang and Dong Wang
Micromachines 2026, 17(1), 84; https://doi.org/10.3390/mi17010084 - 7 Jan 2026
Cited by 2 | Viewed by 972
Abstract
With the rapid advancement of Transformer-based large language models (LLMs), these models have found widespread applications in industrial domains such as code generation and non-functional requirement (NFR) classification in software engineering. However, recent research has primarily focused on optimizing linear matrix operations, while [...] Read more.
With the rapid advancement of Transformer-based large language models (LLMs), these models have found widespread applications in industrial domains such as code generation and non-functional requirement (NFR) classification in software engineering. However, recent research has primarily focused on optimizing linear matrix operations, while nonlinear operators remain relatively underexplored. This paper proposes hardware-efficient approximation and acceleration methods for the Softmax and RMSNorm operators to reduce resource cost and accelerate Transformer inference while maintaining model accuracy. For the Softmax operator, an additional range reduction based on the SafeSoftmax technique enables the adoption of a bipartite lookup table (LUT) approximation and acceleration. The bit-width configuration is optimized through Pareto frontier analysis to balance precision and hardware cost, and an error compensation mechanism is further applied to preserve numerical accuracy. The division is reformulated as a logarithmic subtraction implemented with a small LOD-driven lookup table, eliminating expensive dividers. For RMSNorm, LOD is further leveraged to decompose the reciprocal square root into mantissa and exponent parts, enabling parallel table lookup and a single multiplication. Based on these optimizations, an FPGA-based pipelined accelerator is implemented, achieving low operator-level latency and power consumption with significantly reduced hardware resource usage while preserving model accuracy. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

25 pages, 7245 KB  
Article
A Hardware-Friendly Joint Denoising and Demosaicing System Based on Efficient FPGA Implementation
by Jiqing Wang, Xiang Wang and Yu Shen
Micromachines 2026, 17(1), 44; https://doi.org/10.3390/mi17010044 - 29 Dec 2025
Viewed by 782
Abstract
This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, [...] Read more.
This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

23 pages, 3153 KB  
Article
Domain-Specific Acceleration of Gravity Forward Modeling via Hardware–Software Co-Design
by Yong Yang, Daying Sun, Zhiyuan Ma and Wenhua Gu
Micromachines 2025, 16(11), 1215; https://doi.org/10.3390/mi16111215 - 25 Oct 2025
Viewed by 1354
Abstract
The gravity forward modeling algorithm is a compute-intensive method and is widely used in scientific computing, particularly in geophysics, to predict the impact of subsurface structures on surface gravity fields. Traditional implementations rely on CPUs, where performance gains are mainly achieved through algorithmic [...] Read more.
The gravity forward modeling algorithm is a compute-intensive method and is widely used in scientific computing, particularly in geophysics, to predict the impact of subsurface structures on surface gravity fields. Traditional implementations rely on CPUs, where performance gains are mainly achieved through algorithmic optimization. With the rise of domain-specific architectures, FPGA offers a promising platform for acceleration, but faces challenges such as limited programmability and the high cost of nonlinear function implementation. This work proposes an FPGA-based co-processor to accelerate gravity forward modeling. A RISC-V core is integrated with a custom instruction set targeting key computation steps. Tasks are dynamically scheduled and executed on eight fully pipeline processing units, achieving high parallelism while retaining programmability. To address nonlinear operations, we introduce a piecewise linear approximation method optimized via stochastic gradient descent (SGD), significantly reducing resource usage and latency. The design is implemented on the AMD UltraScale+ ZCU102 FPGA (Advanced Micro Devices, Inc. (AMD), Santa Clara, CA, USA) and evaluated across several forward modeling scenarios. At 250 MHz, the system achieves up to 179× speedup over an Intel Xeon 5218R CPU (Intel Corporation, Santa Clara, CA, USA) and improves energy efficiency by 2040×. To the best of our knowledge, this is the first FPGA-based gravity forward modeling accelerate design. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

Other

Jump to: Research

26 pages, 373 KB  
Perspective
Hardware Accelerators for Cardiovascular Signal Processing: A System-on-Chip Perspective
by Rami Hariri, Marcian Cirstea, Mahdi Maktab Dar Oghaz, Khaled Benkrid and Oliver Faust
Micromachines 2026, 17(1), 51; https://doi.org/10.3390/mi17010051 - 30 Dec 2025
Viewed by 1042
Abstract
This study presents a comprehensive systematic analysis, investigating hardware accelerators specifically designed for real-time cardiovascular signal processing, focusing mainly on Electrocardiogram (ECG), Photoplethysmogram (PPG), and blood pressure monitoring systems. Cardiovascular Diseases (CVDs) represent the world’s leading cause of morbidity and mortality, creating an [...] Read more.
This study presents a comprehensive systematic analysis, investigating hardware accelerators specifically designed for real-time cardiovascular signal processing, focusing mainly on Electrocardiogram (ECG), Photoplethysmogram (PPG), and blood pressure monitoring systems. Cardiovascular Diseases (CVDs) represent the world’s leading cause of morbidity and mortality, creating an urgent demand for efficient and accurate diagnostic technologies. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, we systematically analysed 59 research papers on this topic, published from 2014 to 2024, categorising them into three main categories: signal denoising, feature extraction, and decision support with Machine Learning (ML) or Deep Learning (DL). A comprehensive performance benchmarking across energy efficiency, processing speed, and clinical accuracy demonstrates that hybrid Field Programmable Gate Array (FPGA)-Application Specific Integrated Circuit (ASIC) architectures and specialised Artificial Intelligence (AI) on Edge accelerators represent the most promising solutions for next-generation CVD monitoring systems. The analysis identifies key technological gaps and proposes future research directions focused on developing ultra-low-power, clinically robust, and highly scalable physiological signal processing systems. The findings provide guidance for advancing hardware-accelerated cardiovascular diagnostics toward practical clinical deployment. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

Back to TopTop