MDPI - Publisher of Open Access Journals

24 pages, 2830 KB

Open AccessArticle

Real-Time Radar-Based Hand Motion Recognition on FPGA Using a Hybrid Deep Learning Model

by Taher S. Ahmed, Ahmed F. Mahmoud, Magdy Elbahnasawy, Peter F. Driessen and Ahmed Youssef

Sensors 2026, 26(1), 172; https://doi.org/10.3390/s26010172 - 26 Dec 2025

Viewed by 477

Radar-based hand motion recognition (HMR) presents several challenges, including sensor interference, clutter, and the limitations of small datasets, which collectively hinder the performance and real-time deployment of deep learning (DL) models. To address these issues, this paper introduces a novel real-time HMR framework [...] Read more.

Radar-based hand motion recognition (HMR) presents several challenges, including sensor interference, clutter, and the limitations of small datasets, which collectively hinder the performance and real-time deployment of deep learning (DL) models. To address these issues, this paper introduces a novel real-time HMR framework that integrates advanced signal pre-processing, a hybrid convolutional neural network–support vector machine (CNN–SVM) architecture, and efficient hardware deployment. The pre-processing pipeline applies filtration, squared absolute value computation, and normalization to enhance radar data quality. To improve the robustness of DL models against noise and clutter, time-series radar signals are transformed into binarized images, providing a compact and discriminative representation for learning. A hybrid CNN-SVM model is then utilized for hand motion classification. The proposed model achieves a high classification accuracy of 98.91%, validating the quality of the extracted features and the efficiency of the proposed design. Additionally, it reduces the number of model parameters by approximately 66% relative to the most accurate recurrent baseline (CNN–GRU–SVM) and by up to 86% relative to CNN–BiLSTM–SVM, while achieving the highest SVM test accuracy of 92.79% across all CNN–RNN variants that use the same binarized radar images. For deployment, the model is quantized and implemented on two System-on-Chip (SoC) FPGA platforms—the Xilinx Zynq ZCU102 Evaluation Kit and the Xilinx Kria KR260 Robotics Starter Kit—using the Vitis AI toolchain. The system achieves end-to-end accuracies of 96.13% (ZCU102) and 95.42% (KR260). On the ZCU102, the system achieved a 70% reduction in execution time and a 74% improvement in throughput compared to the PC-based implementation. On the KR260, it achieved a 52% reduction in execution time and a 10% improvement in throughput relative to the same PC baseline. Both implementations exhibited minimal accuracy degradation relative to a PC-based setup—approximately 1% on ZCU102 and 2% on KR260. These results confirm the framework’s suitability for real-time, accurate, and resource-efficient radar-based hand motion recognition across diverse embedded environments. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

27 pages, 2477 KB

Open AccessArticle

BPAP: FPGA Design of a RISC-like Processor for Elliptic Curve Cryptography Using Task-Level Parallel Programming in High-Level Synthesis

by Rares Ifrim and Decebal Popescu

Cryptography 2025, 9(1), 20; https://doi.org/10.3390/cryptography9010020 - 19 Mar 2025

Cited by 1 | Viewed by 1819

Abstract

Popular technologies such as blockchain and zero-knowledge proof, which have already entered the enterprise space, heavily use cryptography as the core of their protocol stack. One of the most used systems in this regard is Elliptic Curve Cryptography, precisely the point multiplication operation, [...] Read more.

Popular technologies such as blockchain and zero-knowledge proof, which have already entered the enterprise space, heavily use cryptography as the core of their protocol stack. One of the most used systems in this regard is Elliptic Curve Cryptography, precisely the point multiplication operation, which provides the security assumption for all applications that use this system. As this operation is computationally intensive, one solution is to offload it to specialized accelerators to provide better throughput and increased efficiency. In this paper, we explore the use of Field Programmable Gate Arrays (FPGAs) and the High-Level Synthesis framework of AMD Vitis in designing an elliptic curve point arithmetic unit (point adder) for the secp256k1 curve. We show how task-level parallel programming and data streaming are used in designing a RISC processor-like architecture to provide pipeline parallelism and increase the throughput of the point adder unit. We also show how to efficiently use the proposed processor architecture by designing a point multiplication scheduler capable of scheduling multiple batches of elliptic curve points to utilize the point adder unit efficiently. Finally, we evaluate our design on an AMD-Xilinx Alveo-family FPGA and show that our point arithmetic processor has better throughput and frequency than related work. Full article

(This article belongs to the Special Issue Interdisciplinary Cryptography)

► Show Figures

Figure 1

15 pages, 626 KB

Open AccessArticle

Fast Resource Estimation of FPGA-Based MLP Accelerators for TinyML Applications

by Argyris Kokkinis and Kostas Siozios

Electronics 2025, 14(2), 247; https://doi.org/10.3390/electronics14020247 - 9 Jan 2025

Cited by 3 | Viewed by 3189

Abstract

Tiny machine learning (TinyML) demands the development of edge solutions that are both low-latency and power-efficient. To achieve these on System-on-Chip (SoC) FPGAs, co-design methodologies, such as hls4ml, have emerged aiming to speed up the design process. In this context, fast estimation of [...] Read more.

Tiny machine learning (TinyML) demands the development of edge solutions that are both low-latency and power-efficient. To achieve these on System-on-Chip (SoC) FPGAs, co-design methodologies, such as hls4ml, have emerged aiming to speed up the design process. In this context, fast estimation of FPGA’s utilized resources is needed to rapidly assess the feasibility of a design. In this paper, we propose a resource estimator for fully customized (bespoke) multilayer perceptrons (MLPs) designed through the hls4ml workflow. Through the analysis of bespoke MLPs synthesized using Xilinx High-Level Synthesis (HLS) tools, we developed resource estimation models for the dense layers’ arithmetic modules and registers. These models consider the unique characteristics inherent to the bespoke nature of the MLPs. Our estimator was evaluated on six different architectures for synthetic and real benchmarks, which were designed using Xilinx Vitis HLS 2022.1 targeting the ZYNQ-7000 FPGAs. Our experimental analysis demonstrates that our estimator can accurately predict the required resources in terms of the utilized Look-Up Tables (LUTs), Flip-Flops (FFs), and Digital Signal Processing (DSP) units in less than 147 ms of single-threaded execution. Full article

(This article belongs to the Special Issue Advancements in Hardware-Efficient Machine Learning)

► Show Figures

Graphical abstract

13 pages, 1853 KB

Open AccessArticle

Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

by Ahmad Mouri Zadeh Khaki and Ahyoung Choi

Appl. Sci. 2025, 15(1), 422; https://doi.org/10.3390/app15010422 - 5 Jan 2025

Cited by 14 | Viewed by 7738

Abstract

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two [...] Read more.

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

18 pages, 7139 KB

Open AccessArticle

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications

by Zhihong Yan, Bingqian Zhang and Dong Wang

Micromachines 2024, 15(9), 1164; https://doi.org/10.3390/mi15091164 - 19 Sep 2024

Cited by 22 | Viewed by 7594

Abstract

The You Only Look Once (YOLO) object detection network has garnered widespread adoption in various industries, owing to its superior inference speed and robust detection capabilities. This model has proven invaluable in automating production processes such as material processing, machining, and quality inspection. [...] Read more.

The You Only Look Once (YOLO) object detection network has garnered widespread adoption in various industries, owing to its superior inference speed and robust detection capabilities. This model has proven invaluable in automating production processes such as material processing, machining, and quality inspection. However, as market competition intensifies, there is a constant demand for higher detection speed and accuracy. Current FPGA accelerators based on 8-bit quantization have struggled to meet these increasingly stringent performance requirements. In response, we present a novel 4-bit quantization-based neural network accelerator for the YOLOv5 model, designed to enhance real-time processing capabilities while maintaining high detection accuracy. To achieve effective model compression, we introduce an optimized quantization scheme that reduces the bit-width of the entire YOLO network—including the first layer—to 4 bits, with only a 1.5% degradation in mean Average Precision (mAP). For the hardware implementation, we propose a unified Digital Signal Processor (DSP) packing scheme, coupled with a novel parity adder tree architecture that accommodates the proposed quantization strategies. This approach efficiently reduces on-chip DSP utilization by 50%, offering a significant improvement in performance and resource efficiency. Experimental results show that the industrial object detection system based on the proposed FPGA accelerator achieves a throughput of 808.6 GOPS and an efficiency of 0.49 GOPS/DSP for YOLOv5s on the ZCU102 board, which is 29% higher than a commercial FPGA accelerator design (Xilinx’s Vitis AI). Full article

(This article belongs to the Special Issue FPGA Applications and Future Trends)

► Show Figures

Figure 1

22 pages, 16272 KB

Open AccessArticle

Edge Real-Time Object Detection and DPU-Based Hardware Implementation for Optical Remote Sensing Images

by Chao Li, Rui Xu, Yong Lv, Yonghui Zhao and Weipeng Jing

Remote Sens. 2023, 15(16), 3975; https://doi.org/10.3390/rs15163975 - 10 Aug 2023

Cited by 16 | Viewed by 4934

Abstract

The accuracy of current deep learning algorithms has certainly increased. However, deploying deep learning networks on edge devices with limited resources is challenging due to their inherent depth and high parameter count. Here, we proposed an improved YOLO model based on an attention [...] Read more.

The accuracy of current deep learning algorithms has certainly increased. However, deploying deep learning networks on edge devices with limited resources is challenging due to their inherent depth and high parameter count. Here, we proposed an improved YOLO model based on an attention mechanism and receptive field (RFA-YOLO) model, applying the MobileNeXt network as the backbone to reduce parameters and complexity, adopting the Receptive Field Block (RFB) and Efficient Channel Attention (ECA) modules to improve the detection accuracy of multi-scale and small objects. Meanwhile, an FPGA-based model deployment solution was proposed to implement parallel acceleration and low-power deployment of the detection algorithm model, which achieved real-time object detection for optical remote sensing images. We implement the proposed DPU and Vitis AI-based object detection algorithms with FPGA deployment to achieve low power consumption and real-time performance requirements. Experimental results on DIOR dataset demonstrate the effectiveness and superiority of our RFA-YOLO model for object detection algorithms. Moreover, to evaluate the performance of the proposed hardware implementation, it was implemented on a Xilinx ZCU104 board. Results of the experiments for hardware and software simulation show that our DPU-based hardware implementation are more power efficient than central processing units (CPUs) and graphics processing units (GPUs), and have the potential to be applied to onboard processing systems with limited resources and power consumption. Full article

► Show Figures

Graphical abstract

16 pages, 5908 KB

Open AccessArticle

Memory-Tree Based Design of Optical Character Recognition in FPGA

by Ke Yu, Minguk Kim and Jun Rim Choi

Electronics 2023, 12(3), 754; https://doi.org/10.3390/electronics12030754 - 2 Feb 2023

Cited by 9 | Viewed by 6287

Abstract

As one of the fields of Artificial Intelligence (AI), Optical Character Recognition (OCR) systems have wide application in both industrial production and daily life. Conventional OCR systems are commonly designed and implement data computation on the basis of microprocessors; the performance of the [...] Read more.

As one of the fields of Artificial Intelligence (AI), Optical Character Recognition (OCR) systems have wide application in both industrial production and daily life. Conventional OCR systems are commonly designed and implement data computation on the basis of microprocessors; the performance of the processor relates to the effect of the computation. However, due to the “Memory-wall” problem and Von Neumann bottlenecks, the drawbacks of traditional processor-based computing for OCR systems are gradually becoming apparent. In this paper, an approach based on the Memory-Centric Computing and “Memory-Tree” algorithm has been proposed to perform hardware optimization of traditional OCR systems. The proposed algorithm was first designed in software implementation using C/C++ and OpenCV to verify the feasibility of the idea and then the RTL conversion of the algorithm was done using the Xilinx Vitis High Level Synthesis (HLS) tool to implement the hardware. This work chose Xilinx Alveo U50 FPGA Accelerator to complete the hardware design, which can be connected to the x86 CPU in the PC by PCIe to form heterogeneous computing. The results of the hardware implementation show that the system this work designed can recognize characters of English capital letters and numbers within 34.24 us. The power of FPGA is 18.59 W, which saves 77.87% of energy consumption compared to the 84 W of the processor in PC. Full article

(This article belongs to the Special Issue FPGAs Based Hardware Design)

► Show Figures

Figure 1

35 pages, 2845 KB

Open AccessArticle

Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators

by Michal Machura, Michal Danilowicz and Tomasz Kryjak

J. Low Power Electron. Appl. 2022, 12(2), 30; https://doi.org/10.3390/jlpea12020030 - 20 May 2022

Cited by 11 | Viewed by 7485

Abstract

Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at [...] Read more.

Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively. Full article

(This article belongs to the Special Issue Hardware for Machine Learning)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI