MDPI - Publisher of Open Access Journals

23 pages, 2255 KB

Open AccessArticle

Design and Implementation of a YOLOv2 Accelerator on a Zynq-7000 FPGA

by Huimin Kim and Tae-Kyoung Kim

Sensors 2025, 25(20), 6359; https://doi.org/10.3390/s25206359 - 14 Oct 2025

Cited by 1 | Viewed by 1747

You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power [...] Read more.

You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power alternative. However, their efficient implementation requires architecture-level optimization tailored to limited device resources. This study presents an optimized YOLOv2 accelerator for the Zynq-7000 system-on-chip (SoC). The design employs 16-bit integer quantization, a filter reuse structure, an input feature map reuse scheme using a line buffer, and tiling parameter optimization for the convolution and max pooling layers to maximize resource efficiency. In addition, a stall-based control mechanism is introduced to prevent structural hazards in the pipeline. The proposed accelerator was implemented on the Zynq-7000 SoC board, and a system-level evaluation confirmed a negligible accuracy drop of only 0.2% compared with the 32-bit floating-point baseline. Compared with previous YOLO accelerators on the same SoC, the design achieved up to 26% and 15% reductions in flip-flop and digital signal processor usage, respectively. This result demonstrates feasible deployment on XC7Z020 with DSP 57.27% and FF 16.55% utilization. Full article

(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)

► Show Figures

Figure 1

26 pages, 442 KB

Open AccessArticle

Improving the Fast Fourier Transform for Space and Edge Computing Applications with an Efficient In-Place Method

by Christoforos Vasilakis, Alexandros Tsagkaropoulos, Ioannis Koutoulas and Dionysios Reisis

Software 2025, 4(2), 11; https://doi.org/10.3390/software4020011 - 12 May 2025

Cited by 1 | Viewed by 3230

Abstract

Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors [...] Read more.

Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors and computing devices with limited resources, the current paper enhances the efficiency of the radix-2 FFT by exploring the benefits of an in-place technique. First, we present the advantages of organizing the single memory bank of processors to store two (2) FFT elements in each memory address and provide parallel load and store of each FFT pair of data. Second, we optimize the floating point (FP) and block floating point (BFP) configurations to improve the FFT Signal-to-Noise (SNR) performance and the resource utilization. The resulting techniques reduce the memory requirements by two and significantly improve the time performance for the overall prevailing BFP representation. The execution of inputs ranging from 1K to 16K FFT points, using 8-bit or 16-bit as FP or BFP numbers, on the space-proven Atmel AVR32 and Vision Processing Unit (VPU) Intel Movidius Myriad 2, the edge device Raspberry Pi Zero 2W and a low-cost accelerator on Xilinx Zynq 7000 Field Programmable Gate Array (FPGA), validates the method’s performance improvement. Full article

► Show Figures

Figure 1

15 pages, 2101 KB

Open AccessArticle

Scalable Transformer Accelerator with Variable Systolic Array for Multiple Models in Voice Assistant Applications

by Seok-Woo Chang and Dong-Sun Kim

Electronics 2024, 13(23), 4683; https://doi.org/10.3390/electronics13234683 - 27 Nov 2024

Cited by 3 | Viewed by 4196

Abstract

Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision [...] Read more.

Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision Transformer models, like the ViT. In this study, we propose a Scalable Transformer Accelerator Unit (STAU) for multiple models, enabling efficient handling of various Transformer models used in voice assistant applications. Variable Systolic Array (VSA) centralized design, along with control and data preprocessing in embedded processors, enables matrix operations of varying sizes. In addition, we propose an efficient variable structure and a row-wise data input method for natural language processing where the word count changes. The proposed scalable Transformer accelerator accelerates text summarization, audio processing, image search, and generative AI used in voice assistance. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

16 pages, 1349 KB

Open AccessArticle

Power Function Algorithms Implemented in Microcontrollers and FPGAs

by Leonid Moroz, Volodymyr Samotyy, Paweł Gepner, Mariusz Węgrzyn and Grzegorz Nowakowski

Electronics 2023, 12(16), 3399; https://doi.org/10.3390/electronics12163399 - 10 Aug 2023

Cited by 7 | Viewed by 3318

Abstract

The exponential function

a^{x}

is widespread in many fields of science. Its calculation is a complicated issue for Central Processing Units (CPUs) and Graphics Processing Units (GPUs), as well as for specialised Digital Signal Processing (DSP) processors, such as Intelligent Processor Units [...] Read more.

The exponential function

a^{x}

is widespread in many fields of science. Its calculation is a complicated issue for Central Processing Units (CPUs) and Graphics Processing Units (GPUs), as well as for specialised Digital Signal Processing (DSP) processors, such as Intelligent Processor Units (IPUs), for the needs of neural networks. This article presents some simple and accurate exponential function calculation algorithms in half, single, and double precision that can be prototyped in Field-Programmable Gate Arrays (FPGAs). It should be noted that, for the approximation, the use of effective polynomials of the first degree was proposed in most cases. The characteristic feature of such algorithms is that they only contain fast ‘bithack’ operations (‘bit manipulation technique’) and Floating-Point (FP) addition, multiplication, and (if necessary) Fused Multiply-Add (FMA) operations. We published an article on algorithms for this class of function recently, but the focus was on the use of approximations of second-degree polynomials and higher, requiring two multiplications and two additions or more, which poses some complications in FPGA implementation. This article considers algorithms based on piecewise linear approximation, with one multiplication and one addition. Such algorithms of low complexity provide decent accuracy and speed, sufficient for practical applications such as accelerators for neural networks, power electronics, machine learning, computer vision, and intelligent robotic systems. These are FP-oriented algorithms; therefore, we briefly describe the characteristic parameters of such numbers. Full article

(This article belongs to the Section Circuit and Signal Processing)

► Show Figures

Figure 1

21 pages, 9030 KB

Open AccessArticle

Reparameterizable Multibranch Bottleneck Network for Lightweight Image Super-Resolution

by Ying Shen, Weihuang Zheng, Feng Huang, Jing Wu and Liqiong Chen

Sensors 2023, 23(8), 3963; https://doi.org/10.3390/s23083963 - 13 Apr 2023

Cited by 7 | Viewed by 3173

Abstract

Deployment of deep convolutional neural networks (CNNs) in single image super-resolution (SISR) for edge computing devices is mainly hampered by the huge computational cost. In this work, we propose a lightweight image super-resolution (SR) network based on a reparameterizable multibranch bottleneck module (RMBM). [...] Read more.

Deployment of deep convolutional neural networks (CNNs) in single image super-resolution (SISR) for edge computing devices is mainly hampered by the huge computational cost. In this work, we propose a lightweight image super-resolution (SR) network based on a reparameterizable multibranch bottleneck module (RMBM). In the training phase, RMBM efficiently extracts high-frequency information by utilizing multibranch structures, including bottleneck residual block (BRB), inverted bottleneck residual block (IBRB), and expand–squeeze convolution block (ESB). In the inference phase, the multibranch structures can be combined into a single 3 × 3 convolution to reduce the number of parameters without incurring any additional computational cost. Furthermore, a novel peak-structure-edge (PSE) loss is proposed to resolve the problem of oversmoothed reconstructed images while significantly improving image structure similarity. Finally, we optimize and deploy the algorithm on the edge devices equipped with the rockchip neural processor unit (RKNPU) to achieve real-time SR reconstruction. Extensive experiments on natural image datasets and remote sensing image datasets show that our network outperforms advanced lightweight SR networks regarding objective evaluation metrics and subjective vision quality. The reconstruction results demonstrate that the proposed network can achieve higher SR performance with a 98.1 K model size, which can be effectively deployed to edge computing devices. Full article

(This article belongs to the Special Issue Image Denoising and Image Super-resolution for Sensing Application)

► Show Figures

Figure 1

20 pages, 4648 KB

Open AccessArticle

An IoT Machine Learning-Based Mobile Sensors Unit for Visually Impaired People

by Salam Dhou, Ahmad Alnabulsi, A. R. Al-Ali, Mariam Arshi, Fatima Darwish, Sara Almaazmi and Reem Alameeri

Sensors 2022, 22(14), 5202; https://doi.org/10.3390/s22145202 - 12 Jul 2022

Cited by 34 | Viewed by 6480

Abstract

Visually impaired people face many challenges that limit their ability to perform daily tasks and interact with the surrounding world. Navigating around places is one of the biggest challenges that face visually impaired people, especially those with complete loss of vision. As the [...] Read more.

Visually impaired people face many challenges that limit their ability to perform daily tasks and interact with the surrounding world. Navigating around places is one of the biggest challenges that face visually impaired people, especially those with complete loss of vision. As the Internet of Things (IoT) concept starts to play a major role in smart cities applications, visually impaired people can be one of the benefitted clients. In this paper, we propose a smart IoT-based mobile sensors unit that can be attached to an off-the-shelf cane, hereafter a smart cane, to facilitate independent movement for visually impaired people. The proposed mobile sensors unit consists of a six-axis accelerometer/gyro, ultrasonic sensors, GPS sensor, cameras, a digital motion processor and a single credit-card-sized single-board microcomputer. The unit is used to collect information about the cane user and the surrounding obstacles while on the move. An embedded machine learning algorithm is developed and stored in the microcomputer memory to identify the detected obstacles and alarm the user about their nature. In addition, in case of emergencies such as a cane fall, the unit alerts the cane user and their guardian. Moreover, a mobile application is developed to be used by the guardian to track the cane user via Google Maps using a mobile handset to ensure safety. To validate the system, a prototype was developed and tested. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

20 pages, 1700 KB

Open AccessArticle

Smart Video Surveillance System Based on Edge Computing

by Antonio Carlos Cob-Parro, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Alfredo Gardel-Vicente and Ignacio Bravo-Muñoz

Sensors 2021, 21(9), 2958; https://doi.org/10.3390/s21092958 - 23 Apr 2021

Cited by 52 | Viewed by 11742

Abstract

New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI [...] Read more.

New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI algorithms in low power consumption embedded devices. The computer vision algorithm, typical for surveillance applications, aims to detect, count and track people’s movements in the area. This application requires a distributed smart camera system. The proposed AI application allows detecting people in the surveillance area using a MobileNet-SSD architecture. In addition, using a robust Kalman filter bank, the algorithm can keep track of people in the video also providing people counting information. The detection results are excellent considering the constraints imposed on the process. The selected architecture for the edge node is based on a UpSquared2 device that includes a vision processor unit (VPU) capable of accelerating the AI CNN inference. The results section provides information about the image processing time when multiple video cameras are connected to the same edge node, people detection precision and recall curves, and the energy consumption of the system. The discussion of results shows the usefulness of deploying this smart camera node throughout a distributed surveillance system. Full article

(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)

► Show Figures

Figure 1

14 pages, 673 KB

Open AccessLetter

An Evaluation of Low-Cost Vision Processors for Efficient Star Identification

by Surabhi Agarwal, Elena Hervas-Martin, Jonathan Byrne, Aubrey Dunne, Jose Luis Espinosa-Aranda and David Rijlaarsdam

Sensors 2020, 20(21), 6250; https://doi.org/10.3390/s20216250 - 2 Nov 2020

Cited by 11 | Viewed by 4800

Abstract

Star trackers are navigation sensors that are used for attitude determination of a satellite relative to certain stars. A star tracker is required to be accurate and also consume as little power as possible in order to be used in small satellites. While [...] Read more.

Star trackers are navigation sensors that are used for attitude determination of a satellite relative to certain stars. A star tracker is required to be accurate and also consume as little power as possible in order to be used in small satellites. While traditional approaches use lookup tables for identifying stars, the latest advances in star tracking use neural networks for automatic star identification. This manuscript evaluates two low-cost processors capable of running a star identification neural network, the Intel Movidius Myriad 2 Vision Processing Unit (VPU) and the STM32 Microcontroller. The intention of this manuscript is to compare the accuracy and power usage to evaluate the suitability of each device for use in a star tracker. The Myriad 2 VPU and the STM32 Microcontroller have been specifically chosen because of their performance on computer vision algorithms alongside being cost-effective and low power consuming devices. The experimental results showed that the Myriad 2 proved to be efficient and consumed around 1 Watt of power while maintaining 99.08% accuracy with an input including false stars. Comparatively the STM32 was able to deliver comparable accuracy (99.07%) and power measurement results. The proposed experimental setup is beneficial for small spacecraft missions that require low-cost and low power consuming star trackers. Full article

(This article belongs to the Special Issue Attitude Sensors)

► Show Figures

Figure 1

17 pages, 5285 KB

Open AccessArticle

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

by Stefania Perri, Cristian Sestito, Fanny Spagnolo and Pasquale Corsonello

J. Imaging 2020, 6(9), 85; https://doi.org/10.3390/jimaging6090085 - 25 Aug 2020

Cited by 7 | Viewed by 3776

Abstract

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity [...] Read more.

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources. Full article

► Show Figures

Figure 1

24 pages, 3182 KB

Open AccessArticle

An FPGA Based Tracking Implementation for Parkinson’s Patients

by Giuseppe Conti, Marcos Quintana, Pedro Malagón and David Jiménez

Sensors 2020, 20(11), 3189; https://doi.org/10.3390/s20113189 - 4 Jun 2020

Cited by 8 | Viewed by 4415

Abstract

This paper presents a study on the optimization of the tracking system designed for patients with Parkinson’s disease tested at a day hospital center. The work performed significantly improves the efficiency of the computer vision based system in terms of energy consumption and [...] Read more.

This paper presents a study on the optimization of the tracking system designed for patients with Parkinson’s disease tested at a day hospital center. The work performed significantly improves the efficiency of the computer vision based system in terms of energy consumption and hardware requirements. More specifically, it optimizes the performances of the background subtraction by segmenting every frame previously characterized by a Gaussian mixture model (GMM). This module is the most demanding part in terms of computation resources, and therefore, this paper proposes a method for its implementation by means of a low-cost development board based on Zynq XC7Z020 SoC (system on chip). The platform used is the ZedBoard, which combines an ARM Processor unit and a FPGA. It achieves real-time performance and low power consumption while performing the target request accurately. The results and achievements of this study, validated in real medical settings, are discussed and analyzed within. Full article

(This article belongs to the Special Issue Sensors and Sensing Technology Applied in Parkinson Disease)

► Show Figures

Figure 1

20 pages, 20221 KB

Open AccessFeature PaperArticle

Open Vision System for Low-Cost Robotics Education

by Julio Vega and José M. Cañas

Electronics 2019, 8(11), 1295; https://doi.org/10.3390/electronics8111295 - 6 Nov 2019

Cited by 16 | Viewed by 6129

Abstract

Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually [...] Read more.

Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually lack such devices, mainly because of the computing limitations of low cost processors. New educational platforms using Raspberry Pi are able to overcome this limitation while keeping costs low, but extracting information from the raw images is complex for children. This paper presents an open source vision system that simplifies the use of cameras in robotics education. It includes functions for the visual detection of complex objects and a visual memory that computes obstacle distances beyond the small field of view of regular cameras. The system was experimentally validated using the PiCam camera mounted on a pan unit on a Raspberry Pi-based robot. The performance and accuracy of the proposed vision system was studied and then used to solve two visual educational exercises: safe visual navigation with obstacle avoidance and person-following behavior. Full article

(This article belongs to the Special Issue Advanced Embedded HW/SW Development)

► Show Figures

Figure 1

19 pages, 8083 KB

Open AccessArticle

FPGA-Based HD Camera System for the Micropositioning of Biomedical Micro-Objects Using a Contactless Micro-Conveyor

by Elmar Yusifli, Reda Yahiaoui, Saeed Mian Qaisar, Mahmoud Addouche, Basil Al-Mahdawi, Hicham Bourouina, Guillaume Herlem and Tijani Gharbi

Micromachines 2017, 8(3), 74; https://doi.org/10.3390/mi8030074 - 2 Mar 2017

Cited by 5 | Viewed by 8114

Abstract

With recent advancements, micro-object contactless conveyers are becoming an essential part of the biomedical sector. They help avoid any infection and damage that can occur due to external contact. In this context, a smart micro-conveyor is devised. It is a Field Programmable Gate [...] Read more.

With recent advancements, micro-object contactless conveyers are becoming an essential part of the biomedical sector. They help avoid any infection and damage that can occur due to external contact. In this context, a smart micro-conveyor is devised. It is a Field Programmable Gate Array (FPGA)-based system that employs a smart surface for conveyance along with an OmniVision complementary metal-oxide-semiconductor (CMOS) HD camera for micro-object position detection and tracking. A specific FPGA-based hardware design and VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) implementation are realized. It is done without employing any Nios processor or System on a Programmable Chip (SOPC) builder based Central Processing Unit (CPU) core. It keeps the system efficient in terms of resource utilization and power consumption. The micro-object positioning status is captured with an embedded FPGA-based camera driver and it is communicated to the Image Processing, Decision Making and Command (IPDC) module. The IPDC is programmed in C++ and can run on a Personal Computer (PC) or on any appropriate embedded system. The IPDC decisions are sent back to the FPGA, which pilots the smart surface accordingly. In this way, an automated closed-loop system is employed to convey the micro-object towards a desired location. The devised system architecture and implementation principle is described. Its functionality is also verified. Results have confirmed the proper functionality of the developed system, along with its outperformance compared to other solutions. Full article

(This article belongs to the Special Issue Medical Microdevices and Micromachines)

► Show Figures

Figure 1

18 pages, 684 KB

Open AccessArticle

Parallel Computational Intelligence-Based Multi-Camera Surveillance System

by Sergio Orts-Escolano, Jose Garcia-Rodriguez, Vicente Morell, Miguel Cazorla, Jorge Azorin and Juan Manuel Garcia-Chamizo

J. Sens. Actuator Netw. 2014, 3(2), 95-112; https://doi.org/10.3390/jsan3020095 - 11 Apr 2014

Cited by 6 | Viewed by 10177

Abstract

In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such [...] Read more.

In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture. Full article

► Show Figures

Graphical abstract

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI