Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (76)

Search Parameters:
Keywords = fixed point arithmetic

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 1061 KB  
Review
FPGA-Based Implementations of Biometric Recognition: A Review
by Ali Kia, Ajan Ahmed and Masudul H. Imtiaz
Electronics 2026, 15(10), 2145; https://doi.org/10.3390/electronics15102145 - 16 May 2026
Viewed by 147
Abstract
Field-programmable gate arrays (FPGAs) are increasingly used to bring biometric recognition from cloud- or GPU-centric deployments to resource-constrained edge devices where latency, power, and privacy are critical. This paper surveys recent (2021–2025) FPGA and FPGA-SoC implementations across five widely deployed modalities: face, fingerprint, [...] Read more.
Field-programmable gate arrays (FPGAs) are increasingly used to bring biometric recognition from cloud- or GPU-centric deployments to resource-constrained edge devices where latency, power, and privacy are critical. This paper surveys recent (2021–2025) FPGA and FPGA-SoC implementations across five widely deployed modalities: face, fingerprint, iris, speaker (voiceprint), and finger vein. For each modality, we summarize representative implementations and the performance figures commonly reported in the literature (e.g., accuracy or EER, latency/throughput, resource usage, and power), highlighting the algorithm–hardware co-design choices that enable real-time operation. Across modalities, successful designs repeatedly employ streaming/dataflow architectures, aggressive quantization and fixed-point arithmetic, reuse-aware buffering, and heterogeneous CPU–FPGA partitioning, often supported by high-level synthesis and vendor deep learning IP. Beyond throughput, we discuss how FPGAs facilitate privacy-preserving on-device processing and can integrate template protection and presentation attack detection within the same fabric. Finally, we identify open challenges related to scalability to larger models, memory-bandwidth constraints, and design productivity, and outline research directions enabled by emerging adaptive FPGA architectures and more automated toolflows. Overall, the surveyed evidence indicates that FPGAs are a compelling platform for deterministic, energy-efficient, and secure biometric inference at the sensor edge. Full article
Show Figures

Figure 1

20 pages, 48835 KB  
Article
Lightweight Hardware Implementation of a State of Charge Estimation Algorithm Using a Piecewise OCV–SOC Model
by Gahyeon Jang, Seungbum Kang and Seongsoo Lee
Electronics 2026, 15(10), 1994; https://doi.org/10.3390/electronics15101994 - 8 May 2026
Viewed by 263
Abstract
State of charge (SOC) estimation is a key function in battery management systems (BMSs) because it directly affects safe operation and available energy prediction. In embedded BMS platforms, information from multiple cells must be processed within tight computation and memory budgets. The estimator [...] Read more.
State of charge (SOC) estimation is a key function in battery management systems (BMSs) because it directly affects safe operation and available energy prediction. In embedded BMS platforms, information from multiple cells must be processed within tight computation and memory budgets. The estimator therefore needs to balance accuracy and implementation cost. This paper presents a lightweight SOC estimation method based on the relationship between open circuit voltage and state of charge (OCV–SOC) in lithium-ion batteries, together with a standalone gauge IP based on finite-state machine (FSM) control. The reference OCV–SOC curve of a commercial 3.7 V lithium-ion cell is approximated by a two-region quadratic model. The IP estimates OCV from the measured terminal voltage with equivalent series resistance (ESR) correction and updates SOC iteratively. To obtain predictable runtime behavior and to suppress oscillatory behavior near convergence, the hardware combines a 1-LSB termination rule with a guard based on a maximum iteration count of Nmax=10. Real-time validation on an FPGA-based battery measurement testbed achieves an overall normalized mean absolute error (NMAE) of 1.6% over charge and discharge data. When synthesized for an Artix-7 XC7A100T, the proposed gauge IP used only 504 LUTs (0.79%) and 580 FFs (0.46%). A TSMC 28 nm MPW implementation further demonstrates feasibility for integration at chip level. Full article
Show Figures

Figure 1

20 pages, 5162 KB  
Article
Lossless Reversible Color Image Encryption Using Multilayer Hybrid Chaos with Gram–Schmidt Orthogonalization and ChaCha20-HMAC-Authenticated Transport
by Saadia Drissi, Faiq Gmira and Meriyem Chergui
Technologies 2026, 14(4), 235; https://doi.org/10.3390/technologies14040235 - 16 Apr 2026
Viewed by 500
Abstract
In this study, a hybrid multi-layer scheme for reversible color image encryption is proposed, ensuring lossless reconstruction and strong cryptographic security concurrently. This method consists of three main stages. First, session-specific keys are generated using HKDF-SHA256 along with a timestamp-based mechanism to prevent [...] Read more.
In this study, a hybrid multi-layer scheme for reversible color image encryption is proposed, ensuring lossless reconstruction and strong cryptographic security concurrently. This method consists of three main stages. First, session-specific keys are generated using HKDF-SHA256 along with a timestamp-based mechanism to prevent replay attacks and support dynamic key management. Second, a four-layer confusion–diffusion structure is applied. It uses Gram–Schmidt orthogonal matrices, integer-based PWLCM chaotic mapping, the Hill cipher, and dynamically created S-Boxes. These operations rely on integer modular arithmetic 256 and Q16.16 fixed-point precision. Finally, ChaCha20 stream encryption with HMAC-SHA256 authentication is used to secure data transmission in distributed environments. Experimental tests conducted on standard images show strong cryptographic performance, including near-ideal entropy (7.9993 bits), a significant avalanche effect (NPCR 99.6%, UACI 33.4%), and very low pixel correlation. The method achieves perfect lossless reconstruction and provides an effective key space 2128. These results confirm the suitability of the proposed scheme for secure image protection in applications requiring bit-exact recovery, such as medical imaging, digital forensics, and satellite communications. Full article
Show Figures

Figure 1

28 pages, 677 KB  
Article
Mathematical Investigation of Cancer-Immune-Angiogenesis Model Using Fuzzy Piecewise Fractional Derivatives
by Rabeb Sidaoui, Ashraf A. Qurtam, Mohammed Almalahi, Habeeb Ibrahim, Khaled Aldwoah, Amer Alsulami and Mohammed Messaoudi
Fractal Fract. 2026, 10(4), 260; https://doi.org/10.3390/fractalfract10040260 - 15 Apr 2026
Viewed by 392
Abstract
This work develops a fuzzy piecewise fractional derivative (FPFD) model for cancer-immune-angiogenesis dynamics under uncertainty. Five fuzzy state variables track tumor cells, immune effectors, vessel density, oxygen, and drug concentration. We employ fuzzy triangular numbers with α-cut interval arithmetic using constrained fuzzy [...] Read more.
This work develops a fuzzy piecewise fractional derivative (FPFD) model for cancer-immune-angiogenesis dynamics under uncertainty. Five fuzzy state variables track tumor cells, immune effectors, vessel density, oxygen, and drug concentration. We employ fuzzy triangular numbers with α-cut interval arithmetic using constrained fuzzy arithmetic model parametric uncertainty, with numerical values. Oxygen-dependent carrying capacity follows a Hill-type function; hypoxia-induced angiogenesis follows a decreasing Michaelis–Menten function. The model transitions at t1=50 days from memoryless fuzzy classical derivative to fuzzy ABC fractional derivative of order ψ. The transition time t1=50 days is biologically justified based on experimental observations of the angiogenic switch in solid tumors, which typically occurs within 4–8 weeks post-inoculation. Positivity, boundedness, Lipschitz continuity, existence, and uniqueness of fuzzy solutions are proved via Banach fixed-point theorem in a weighted norm. A basic reproduction number interval R0=[R̲0,R¯0] is derived; local and global stability conditions are established for disease-free and endemic equilibria using fuzzy differential inclusions. Global sensitivity analysis using latin hypercube sampling with N=500 samples explores the range of possible outcomes across the fuzzy parameter support. In the numerical implementation, we use a fourth-order fuzzy Runge–Kutta method (Phase I), and a fractional Adams–Bashforth–Moulton predictor-corrector method (Phase II), ensuring preservation of fuzzy number characteristics. Full article
Show Figures

Figure 1

31 pages, 4949 KB  
Article
Attention Distribution-Aware Softmax for NPU-Accelerated On-Device Inference of LLMs: An Edge-Oriented Approximation Design
by Sanoop Sadheerthan, Min-Jie Hsu, Chih-Hsiang Huang and Yin-Tien Wang
Electronics 2026, 15(6), 1312; https://doi.org/10.3390/electronics15061312 - 20 Mar 2026
Viewed by 962
Abstract
Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate ex using uniform piecewise polynomials to enable O(1) SIMD indexing, but this [...] Read more.
Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate ex using uniform piecewise polynomials to enable O(1) SIMD indexing, but this wastes computation by applying high-degree arithmetic indiscriminately in every segment. Conversely, fully adaptive approaches maximize statistical fidelity but introduce pipeline stalls due to comparator-based boundary search. To bridge this gap, we propose an attention distribution-aware softmax that uses Particle Swarm Optimization (PSO) to define non-uniform segments and variable polynomial degrees, prioritizing finer granularity and lower arithmetic complexity in attention-dense regions. To ensure efficiency, we snap boundaries into a 128-bin LUT, enabling O(1) retrieval of segment parameters without branching. Inference measurements show that this favors low-degree execution, minimizing exp-kernel overhead. Using TinyLlama-1.1B-Chat as a testbed, the proposed weighted design reduces cycles per call exp kernel (CPC) by 18.5% versus an equidistant uniform Degree-4 baseline and 13.1% versus uniform Degree-3, while preserving ranking fidelity. These results show that grid-snapped, variable-degree approximation can improve softmax efficiency while largely preserving attention ranking fidelity, enabling accurate edge LLM inference. Full article
(This article belongs to the Special Issue Emerging Applications of FPGAs and Reconfigurable Computing System)
Show Figures

Figure 1

41 pages, 1834 KB  
Article
Excursion Laplace Exponents Under Height Truncation
by Tristan Guillaume
Mathematics 2026, 14(6), 1014; https://doi.org/10.3390/math14061014 - 17 Mar 2026
Viewed by 310
Abstract
We study one-dimensional diffusions reflected at a boundary and analyze their pathwise “episodes” away from the boundary through Itô’s excursion theory. Under a fixed height cap of a>0, each excursion is equipped with three natural marks: its lifetime ζ, [...] Read more.
We study one-dimensional diffusions reflected at a boundary and analyze their pathwise “episodes” away from the boundary through Itô’s excursion theory. Under a fixed height cap of a>0, each excursion is equipped with three natural marks: its lifetime ζ, its maximum M, and an additive (area-type) functional Af=0ζf(et)dt. Our main object is the height-truncated Itô-excursion Laplace exponent Ψα,λ;af:=n1eαζλAf; M<a which jointly characterizes episode duration and cumulative load while excluding barrier-crossing spikes. We establish a general boundary–flux representation: Ψα,λ;af is obtained as a boundary flux (in scale) of the unique solution to a one-dimensional killed Feynman–Kac boundary-value problem on (0, a). This transfer principle yields a unified and tractable route to explicit computation. We implement it in three solvable families—the reflected arithmetic Brownian motion, reflected Ornstein–Uhlenbeck diffusions, and squared Bessel/Bessel-type diffusions—obtaining closed forms in terms of Airy, parabolic-cylinder, and confluent hypergeometric/Whittaker functions. Using the Poisson point process structure of excursions indexed by local time, we derive explicit extreme-burst laws (maxima and order statistics) for the additive marks up to a local-time horizon, and connect tail intensities to Laplace exponents via numerical Laplace inversion. Finally, we identify the strictly truncated cumulative load in local time as a (typically infinite-activity) subordinator whose Lévy measure coincides with the excursion-mark intensity, linking cumulative-load and extreme-burst statistics through the same exponent. Full article
Show Figures

Figure 1

18 pages, 4228 KB  
Article
Design Space Exploration on Blind Equalization Algorithms: Numerical Representation Analysis for SoC-FPGA
by David Marquez-Viloria, L. J. Morantes-Guzman, Neil Guerrero-Gonzalez and Marin B. Marinov
Appl. Sci. 2026, 16(6), 2777; https://doi.org/10.3390/app16062777 - 13 Mar 2026
Viewed by 390
Abstract
Field-Programmable Gate Arrays (FPGAs) have become an important platform for accelerating real-time communication systems, and System-on-Chip (SoC) devices provide the flexibility to design and optimize architectures that support high data rates, different modulation formats, and channel equalization schemes. Selecting the appropriate architecture can [...] Read more.
Field-Programmable Gate Arrays (FPGAs) have become an important platform for accelerating real-time communication systems, and System-on-Chip (SoC) devices provide the flexibility to design and optimize architectures that support high data rates, different modulation formats, and channel equalization schemes. Selecting the appropriate architecture can be guided through Design Space Exploration (DSE) using high-level synthesis tools, which enables the identification of numerical representations that balance performance with reduced hardware resource consumption. Despite their relevance, recent developments in communication systems often overlook the impact of numerical precision in Digital Signal Processing algorithms, particularly the trade-offs between floating- and fixed-point arithmetic when targeting hardware implementations. In this work, two widely used blind equalization algorithms, the Constant Modulus Algorithm (CMA) and the Multi-Modulus Algorithm (MMA), were implemented on a low-cost Ultra96 SoC-FPGA to analyze the effect of a fixed-point representation. A multi-objective Design Space Exploration methodology was applied to minimize hardware utilization while maintaining reliable transmission performance. Resource consumption, latency, and throughput were measured across different binary formats using the Minimum Mean Square Error (MMSE) criterion. Parallelization techniques were incorporated to improve throughput. The DSE generated comprehensive performance surfaces quantifying latency, MMSE convergence, and FPGA resource utilization (DSP48E/FF/LUT/BRAM) across fixed-point formats, achieving optimal 4 MS/s throughput configurations. Although this throughput is naturally lower than the Gigabit speeds required in backbone optical networks, the results demonstrate the effectiveness of numerical representation optimization in resource-constrained SoC-FPGA devices, offering a practical approach for real-time Edge and IoT implementations where cost and hardware limitations are critical. Full article
Show Figures

Figure 1

29 pages, 1565 KB  
Article
Integer Intelligence: A Reproducible Path from Training to FPGA
by Manjusha Shanker and Tee Hui Teo
Electronics 2026, 15(5), 1117; https://doi.org/10.3390/electronics15051117 - 8 Mar 2026
Viewed by 557
Abstract
A transparent, end-to-end pathway from learning-level training to deployable fixed-point hardware is presented and framed as gradients to gates. A didactic XOR convolutional network is first employed so that backpropagation, post-training quantization in INT8, and fixed-point arithmetic can be made concrete and verified [...] Read more.
A transparent, end-to-end pathway from learning-level training to deployable fixed-point hardware is presented and framed as gradients to gates. A didactic XOR convolutional network is first employed so that backpropagation, post-training quantization in INT8, and fixed-point arithmetic can be made concrete and verified with exact checks. The same methodology was applied to a compact LeNet-5 case study. On the software side, the training-to-export flow was formalized, and a bit-accurate Python reference was constructed for the quantized network. On the hardware side, a synthesizable INT8 datapath was implemented in Verilog, including multiply–accumulate units, sigmoid activation stages, and per-layer requantization with rounding and saturation. Test benches are provided so that the exported weights and activations can be ingested, and layer-wise matches can be reported. A co-simulation harness was used to coordinate framework inference, quantization, file conversion, HDL simulation, and regression checks, which enabled deterministic comparisons of the activations, partial sums and outputs. The complete loop was mapped to Artix-7 on the CMOD A7 development board, and the resource usage, maximum clock frequency, inference latency, and throughput were determined. The approach aligns with an educational HDL-to-Caffe pipeline by using reusable parameterized Verilog primitives for convolution, pooling, activation, and fully connected layers, training in Colab with AccDNN, Caffe, quantization, and an automated bit-for-bit verification regime before FPGA synthesis. Methodological contributions are provided, including a minimal and auditable XOR CNN that exposes scales, shifts, and saturation; a practical quantization recipe with INT32 accumulation and unit tests that guarantee agreement within one least significant bit between RTL and the INT8 reference; and a scalable mapping to LeNet-5 using a row-stationary and line-buffered dataflow on an Artix-7 FPGA. Empirical evidence shows feasibility at 100 MHz with representative utilization, millisecond-scale latency and zero mismatches across large test sets, which validates the quantization configuration and the verification strategy. Full article
(This article belongs to the Special Issue Recent Advances in AI Hardware Design)
Show Figures

Figure 1

31 pages, 20829 KB  
Article
FPGA Implementation of a Secure Audio Encryption System Based on Chameleon Chaotic Algorithm
by Alaa Shumran, Abdul-Basset A. Al-Hussein and Viet-Thanh Pham
Dynamics 2026, 6(1), 9; https://doi.org/10.3390/dynamics6010009 - 7 Mar 2026
Viewed by 1456
Abstract
The growing need to safeguard sensitive data in various fields, including in relation to education, banking over the phone, private voice conferences, and the military, has grown as dependence on technology in daily life has increased. Encryption schemes based on chaotic systems are [...] Read more.
The growing need to safeguard sensitive data in various fields, including in relation to education, banking over the phone, private voice conferences, and the military, has grown as dependence on technology in daily life has increased. Encryption schemes based on chaotic systems are among the most commonly utilized approaches in the security field due to their high levels of safety and reliability. This study proposes a secure audio encryption framework based on the Chameleon chaotic algorithm implemented on a Xilinx ZedBoard Zynq-7000 FPGA. The system was designed using a fixed-point arithmetic format with 32-bit precision (eight integers; 24 fractional bits) with the Xilinx System Generator in MATLAB Simulink R2021b and verified using Vivado. The Chameleon Chaotic System, characterized by its transition from self-excited to hidden attractors through parameter variation, adds complexity to the system dynamics and strengthens the encryption algorithm. The Adaptive Feedback Control technique was applied to synchronize the signals. These methods enhance the security of audio data by ensuring robust and fast synchronization during transmission. The performance of the proposed system was assessed using correlation analysis, the mean squared error, histogram analysis, and audio spectrogram analysis. The system demonstrated strong encryption capabilities with low correlation values (−0.0033). In decryption, they achieved high fidelity with a correlation exceeding 0.999 in noise-free conditions and above 0.9933 under 20 dB AWGN. Adaptive Feedback Control showed superior decryption precision with lower MSEU and higher PSNR, confirming its effectiveness under noisy environments. Full article
(This article belongs to the Special Issue Theory and Applications in Nonlinear Oscillators: 2nd Edition)
Show Figures

Figure 1

39 pages, 84580 KB  
Article
FPGA Implementation and Performance Evaluation of Classic PID, IMC and DTC for BLDC Motor Control
by Jaber Ouakrim, Abdoulaye Bodian, Dina Ouardani and Alben Cardenas
Vehicles 2026, 8(2), 42; https://doi.org/10.3390/vehicles8020042 - 22 Feb 2026
Viewed by 1295
Abstract
Brushless DC (BLDC) motors are widely used in mobile robotics and off-road vehicles due to their high efficiency, reliability, and compactness. However, achieving robust, high-performance speed control in embedded environments remains challenging due to nonlinearities, dead-time effects, parameter uncertainties, and strict real-time constraints. [...] Read more.
Brushless DC (BLDC) motors are widely used in mobile robotics and off-road vehicles due to their high efficiency, reliability, and compactness. However, achieving robust, high-performance speed control in embedded environments remains challenging due to nonlinearities, dead-time effects, parameter uncertainties, and strict real-time constraints. This paper presents a comprehensive experimental study of classical and robust control strategies for BLDC motor speed control, fully implemented on an FPGA platform. Classical PI and PID controllers tuned using Ziegler–Nichols, Cohen–Coon, and Chien–Hrones–Reswick methods are first investigated and discretized using both Zero-Order Hold (ZOH) and Tustin (bilinear) approximations. Model-based approaches, including IMC-based PID controllers, are then introduced to enhance robustness. In addition, a robust two-degree-of-freedom dead-time compensator (DTC) is implemented to explicitly address dead-time uncertainties inherent to inverter-based motor drives. All controllers are implemented using fixed-point arithmetic on a Xilinx Nexys A7 FPGA and validated experimentally on a BLDC motor test bench representative of semi-autonomous robotic applications. Performance is evaluated through time-domain responses and quantitative indices, including ISE, ITAE, I, control effort, and FPGA resource utilization. Experimental tests under controlled DC bus voltage disturbances are conducted to assess disturbance rejection capability and robustness under realistic operating conditions. Experimental results demonstrate that Tustin discretization consistently improves tracking performance, while IMC-PID and DTC strategies provide superior robustness against dead-time and modeling uncertainties, making them particularly suitable for embedded FPGA-based motor control. Full article
Show Figures

Figure 1

14 pages, 1097 KB  
Article
Low-Power Embedded Sensor Node for Real-Time Environmental Monitoring with On-Board Machine-Learning Inference
by Manuel J. C. S. Reis
Sensors 2026, 26(2), 703; https://doi.org/10.3390/s26020703 - 21 Jan 2026
Viewed by 1239
Abstract
This paper presents the design and optimisation of a low-power embedded sensor-node architecture for real-time environmental monitoring with on-board machine-learning inference. The proposed system integrates heterogeneous sensing elements for air quality and ambient parameters (temperature, humidity, gas concentration, and particulate matter) into a [...] Read more.
This paper presents the design and optimisation of a low-power embedded sensor-node architecture for real-time environmental monitoring with on-board machine-learning inference. The proposed system integrates heterogeneous sensing elements for air quality and ambient parameters (temperature, humidity, gas concentration, and particulate matter) into a modular embedded platform based on a low-power microcontroller coupled with an energy-efficient neural inference accelerator. The design emphasises end-to-end energy optimisation through adaptive duty-cycling, hierarchical power domains, and edge-level data reduction. The embedded machine-learning layer performs lightweight event/anomaly detection via on-device multi-class classification (normal/anomalous/critical) using quantised neural models in fixed-point arithmetic. A comprehensive system-level analysis, performed via MATLAB Simulink simulations, evaluates inference accuracy, latency, and energy consumption under realistic environmental conditions. Results indicate that the proposed node achieves 94% inference accuracy, 0.87 ms latency, and an average power consumption of approximately 2.9 mWh, enabling energy-autonomous operation with hybrid solar–battery harvesting. The adaptive LoRaWAN communication strategy further reduces data transmissions by ≈88% relative to periodic reporting. The results indicate that on-device inference can reduce network traffic while maintaining reliable event detection under the evaluated operating conditions. The proposed architecture is intended to support energy-efficient environmental sensing deployments in smart-city and climate-monitoring contexts. Full article
(This article belongs to the Special Issue Applications of Sensors Based on Embedded Systems)
Show Figures

Figure 1

23 pages, 13345 KB  
Article
Neural-Based Controller on Low-Density FPGAs for Dynamic Systems
by Edson E. Cruz-Miguel, José R. García-Martínez, Jorge Orrante-Sakanassi, José M. Álvarez-Alvarado, Omar A. Barra-Vázquez and Juvenal Rodríguez-Reséndiz
Electronics 2026, 15(1), 198; https://doi.org/10.3390/electronics15010198 - 1 Jan 2026
Cited by 1 | Viewed by 620
Abstract
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system [...] Read more.
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system variations while maintaining low hardware complexity. Unlike conventional artificial intelligence solutions that require high-performance processors or Graphics Processing Units (GPUs), the proposed approach targets platforms with limited logic, memory, and computational resources. The ANN controller was described using a Hardware Description Language (HDL) and validated via cosimulation between ModelSim and Simulink. A practical comparison was also made between Proportional-Integral-Derivative (PID) control and an ANN for motor position control. The results confirm that the architecture efficiently utilizes FPGA resources, consuming approximately 50% of the available Digital Signal Processor (DSP) units, less than 40% of logic cells, and only 6% of embedded memory blocks. Owing to its modular design, the architecture is inherently scalable, allowing additional inputs or hidden-layer neurons to be incorporated with minimal impact on overall resource usage. Additionally, the computational latency can be precisely determined and scales with (16n+39)m+31 clock cycles, enabling precise timing analysis and facilitating integration into real-time embedded control systems. Full article
Show Figures

Figure 1

17 pages, 558 KB  
Article
FPGA-Accelerated Multi-Resolution Spline Reconstruction for Real-Time Multimedia Signal Processing
by Manuel J. C. S. Reis
Electronics 2026, 15(1), 173; https://doi.org/10.3390/electronics15010173 - 30 Dec 2025
Viewed by 1020
Abstract
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, [...] Read more.
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, the system supports parallel processing of multiple channels, with low-latency memory access and pipelined arithmetic units. The proposed pipeline achieves a throughput of up to 33.1 megasmples per second for 1D signals and 19.4 megapixels per second for 2D images, while maintaining average power consumption below 250 mW. Compared to CPU and embedded GPU implementations, the design delivers >15× improvement in energy efficiency and deterministic low-latency performance (8–12 clock cycles). A key novelty lies in combining multi-resolution B-spline reconstruction with fixed-point arithmetic and streaming-friendly pipelining, making the architecture modular, compact, and robust to varying input rates. Benchmarking results on synthetic and real multimedia datasets show significant improvements in throughput and energy efficiency compared to conventional CPU and GPU implementations. The architecture supports flexible resolution scaling, making it suitable for edge-computing scenarios in multimedia environments. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

24 pages, 902 KB  
Article
Differentiable Selection of Bit-Width and Numeric Format for FPGA-Efficient Deep Networks
by Kawthar Dellel, Emanuel Trabes, Aymen Zayed, Hassene Faiedh and Carlos Valderrama
Electronics 2025, 14(18), 3715; https://doi.org/10.3390/electronics14183715 - 19 Sep 2025
Viewed by 1579
Abstract
Quantization-aware training (QAT) has emerged as a key strategy for enabling efficient deep learning inference on resource-constrained platforms. Yet, most existing approaches rely on static, manually selected numeric formats—fixed-point or floating-point—and fixed bit-widths, limiting their adaptability and often requiring extensive design effort or [...] Read more.
Quantization-aware training (QAT) has emerged as a key strategy for enabling efficient deep learning inference on resource-constrained platforms. Yet, most existing approaches rely on static, manually selected numeric formats—fixed-point or floating-point—and fixed bit-widths, limiting their adaptability and often requiring extensive design effort or architecture search. In this work, we introduce a novel QAT framework that breaks this rigidity by jointly learning, during training, both the numeric representation format and the associated bit-widths in an end-to-end differentiable manner. At the core of our method lies a unified parameterization that is capable of emulating both fixed- and floating-point arithmetic, paired with a bit-aware loss function that penalizes excessive precision in a hardware-aligned fashion. We demonstrate that our approach achieves state-of-the-art trade-offs between accuracy and compression on MNIST, CIFAR-10, and CIFAR-100, reducing average bit-widths to as low as 1.4 with minimal accuracy loss. Furthermore, FPGA implementation using Xilinx FINN confirms over 5× LUT and 4× BRAM savings. This is the first QAT method to unify numeric format learning with differentiable precision control, enabling highly deployable, precision-adaptive deep neural networks. Full article
(This article belongs to the Special Issue Intelligent Embedded Systems: Latest Advances and Applications)
Show Figures

Figure 1

22 pages, 445 KB  
Article
Design of Real-Time Gesture Recognition with Convolutional Neural Networks on a Low-End FPGA
by Rui Policarpo Duarte, Tiago Gonçalves, Gustavo Jacinto, Paulo Flores and Mário Véstias
Electronics 2025, 14(17), 3457; https://doi.org/10.3390/electronics14173457 - 29 Aug 2025
Cited by 1 | Viewed by 1357
Abstract
Hand gesture recognition is used in human–computer interaction, with multiple applications in assistive technologies, virtual reality, and smart systems. While vision-based methods are commonly employed, they are often computationally intensive, sensitive to environmental conditions, and raise privacy concerns. This work proposes a hardware/software [...] Read more.
Hand gesture recognition is used in human–computer interaction, with multiple applications in assistive technologies, virtual reality, and smart systems. While vision-based methods are commonly employed, they are often computationally intensive, sensitive to environmental conditions, and raise privacy concerns. This work proposes a hardware/software co-optimized system for real-time hand gesture recognition using accelerometer data, designed for a portable, low-cost platform. A Convolutional Neural Network from TinyML is implemented on a Xilinx Zynq-7000 SoC-FPGA, utilizing fixed-point arithmetic to minimize computational complexity while maintaining classification accuracy. Additionally, combined architectural optimizations, including pipelining and loop unrolling, are applied to enhance processing efficiency. The final system achieves a 62× speedup over an unoptimized floating-point implementation while reducing power consumption, making it suitable for embedded and battery-powered applications. Full article
Show Figures

Figure 1

Back to TopTop