Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = vision processor unit

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 2255 KB  
Article
Design and Implementation of a YOLOv2 Accelerator on a Zynq-7000 FPGA
by Huimin Kim and Tae-Kyoung Kim
Sensors 2025, 25(20), 6359; https://doi.org/10.3390/s25206359 - 14 Oct 2025
Cited by 1 | Viewed by 1747
Abstract
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power [...] Read more.
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power alternative. However, their efficient implementation requires architecture-level optimization tailored to limited device resources. This study presents an optimized YOLOv2 accelerator for the Zynq-7000 system-on-chip (SoC). The design employs 16-bit integer quantization, a filter reuse structure, an input feature map reuse scheme using a line buffer, and tiling parameter optimization for the convolution and max pooling layers to maximize resource efficiency. In addition, a stall-based control mechanism is introduced to prevent structural hazards in the pipeline. The proposed accelerator was implemented on the Zynq-7000 SoC board, and a system-level evaluation confirmed a negligible accuracy drop of only 0.2% compared with the 32-bit floating-point baseline. Compared with previous YOLO accelerators on the same SoC, the design achieved up to 26% and 15% reductions in flip-flop and digital signal processor usage, respectively. This result demonstrates feasible deployment on XC7Z020 with DSP 57.27% and FF 16.55% utilization. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

26 pages, 442 KB  
Article
Improving the Fast Fourier Transform for Space and Edge Computing Applications with an Efficient In-Place Method
by Christoforos Vasilakis, Alexandros Tsagkaropoulos, Ioannis Koutoulas and Dionysios Reisis
Software 2025, 4(2), 11; https://doi.org/10.3390/software4020011 - 12 May 2025
Cited by 1 | Viewed by 3230
Abstract
Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors [...] Read more.
Satellite and edge computing designers develop algorithms that restrict resource utilization and execution time. Among these design efforts, optimizing Fast Fourier Transform (FFT), key to many tasks, has led mainly to in-place FFT-specific hardware accelerators. Aiming at improving the FFT performance on processors and computing devices with limited resources, the current paper enhances the efficiency of the radix-2 FFT by exploring the benefits of an in-place technique. First, we present the advantages of organizing the single memory bank of processors to store two (2) FFT elements in each memory address and provide parallel load and store of each FFT pair of data. Second, we optimize the floating point (FP) and block floating point (BFP) configurations to improve the FFT Signal-to-Noise (SNR) performance and the resource utilization. The resulting techniques reduce the memory requirements by two and significantly improve the time performance for the overall prevailing BFP representation. The execution of inputs ranging from 1K to 16K FFT points, using 8-bit or 16-bit as FP or BFP numbers, on the space-proven Atmel AVR32 and Vision Processing Unit (VPU) Intel Movidius Myriad 2, the edge device Raspberry Pi Zero 2W and a low-cost accelerator on Xilinx Zynq 7000 Field Programmable Gate Array (FPGA), validates the method’s performance improvement. Full article
Show Figures

Figure 1

15 pages, 2101 KB  
Article
Scalable Transformer Accelerator with Variable Systolic Array for Multiple Models in Voice Assistant Applications
by Seok-Woo Chang and Dong-Sun Kim
Electronics 2024, 13(23), 4683; https://doi.org/10.3390/electronics13234683 - 27 Nov 2024
Cited by 3 | Viewed by 4196
Abstract
Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision [...] Read more.
Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision Transformer models, like the ViT. In this study, we propose a Scalable Transformer Accelerator Unit (STAU) for multiple models, enabling efficient handling of various Transformer models used in voice assistant applications. Variable Systolic Array (VSA) centralized design, along with control and data preprocessing in embedded processors, enables matrix operations of varying sizes. In addition, we propose an efficient variable structure and a row-wise data input method for natural language processing where the word count changes. The proposed scalable Transformer accelerator accelerates text summarization, audio processing, image search, and generative AI used in voice assistance. Full article
(This article belongs to the Topic Theory and Applications of High Performance Computing)
Show Figures

Figure 1

16 pages, 1349 KB  
Article
Power Function Algorithms Implemented in Microcontrollers and FPGAs
by Leonid Moroz, Volodymyr Samotyy, Paweł Gepner, Mariusz Węgrzyn and Grzegorz Nowakowski
Electronics 2023, 12(16), 3399; https://doi.org/10.3390/electronics12163399 - 10 Aug 2023
Cited by 7 | Viewed by 3318
Abstract
The exponential function ax is widespread in many fields of science. Its calculation is a complicated issue for Central Processing Units (CPUs) and Graphics Processing Units (GPUs), as well as for specialised Digital Signal Processing (DSP) processors, such as Intelligent Processor Units [...] Read more.
The exponential function ax is widespread in many fields of science. Its calculation is a complicated issue for Central Processing Units (CPUs) and Graphics Processing Units (GPUs), as well as for specialised Digital Signal Processing (DSP) processors, such as Intelligent Processor Units (IPUs), for the needs of neural networks. This article presents some simple and accurate exponential function calculation algorithms in half, single, and double precision that can be prototyped in Field-Programmable Gate Arrays (FPGAs). It should be noted that, for the approximation, the use of effective polynomials of the first degree was proposed in most cases. The characteristic feature of such algorithms is that they only contain fast ‘bithack’ operations (‘bit manipulation technique’) and Floating-Point (FP) addition, multiplication, and (if necessary) Fused Multiply-Add (FMA) operations. We published an article on algorithms for this class of function recently, but the focus was on the use of approximations of second-degree polynomials and higher, requiring two multiplications and two additions or more, which poses some complications in FPGA implementation. This article considers algorithms based on piecewise linear approximation, with one multiplication and one addition. Such algorithms of low complexity provide decent accuracy and speed, sufficient for practical applications such as accelerators for neural networks, power electronics, machine learning, computer vision, and intelligent robotic systems. These are FP-oriented algorithms; therefore, we briefly describe the characteristic parameters of such numbers. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

21 pages, 9030 KB  
Article
Reparameterizable Multibranch Bottleneck Network for Lightweight Image Super-Resolution
by Ying Shen, Weihuang Zheng, Feng Huang, Jing Wu and Liqiong Chen
Sensors 2023, 23(8), 3963; https://doi.org/10.3390/s23083963 - 13 Apr 2023
Cited by 7 | Viewed by 3173
Abstract
Deployment of deep convolutional neural networks (CNNs) in single image super-resolution (SISR) for edge computing devices is mainly hampered by the huge computational cost. In this work, we propose a lightweight image super-resolution (SR) network based on a reparameterizable multibranch bottleneck module (RMBM). [...] Read more.
Deployment of deep convolutional neural networks (CNNs) in single image super-resolution (SISR) for edge computing devices is mainly hampered by the huge computational cost. In this work, we propose a lightweight image super-resolution (SR) network based on a reparameterizable multibranch bottleneck module (RMBM). In the training phase, RMBM efficiently extracts high-frequency information by utilizing multibranch structures, including bottleneck residual block (BRB), inverted bottleneck residual block (IBRB), and expand–squeeze convolution block (ESB). In the inference phase, the multibranch structures can be combined into a single 3 × 3 convolution to reduce the number of parameters without incurring any additional computational cost. Furthermore, a novel peak-structure-edge (PSE) loss is proposed to resolve the problem of oversmoothed reconstructed images while significantly improving image structure similarity. Finally, we optimize and deploy the algorithm on the edge devices equipped with the rockchip neural processor unit (RKNPU) to achieve real-time SR reconstruction. Extensive experiments on natural image datasets and remote sensing image datasets show that our network outperforms advanced lightweight SR networks regarding objective evaluation metrics and subjective vision quality. The reconstruction results demonstrate that the proposed network can achieve higher SR performance with a 98.1 K model size, which can be effectively deployed to edge computing devices. Full article
(This article belongs to the Special Issue Image Denoising and Image Super-resolution for Sensing Application)
Show Figures

Figure 1

20 pages, 4648 KB  
Article
An IoT Machine Learning-Based Mobile Sensors Unit for Visually Impaired People
by Salam Dhou, Ahmad Alnabulsi, A. R. Al-Ali, Mariam Arshi, Fatima Darwish, Sara Almaazmi and Reem Alameeri
Sensors 2022, 22(14), 5202; https://doi.org/10.3390/s22145202 - 12 Jul 2022
Cited by 34 | Viewed by 6480
Abstract
Visually impaired people face many challenges that limit their ability to perform daily tasks and interact with the surrounding world. Navigating around places is one of the biggest challenges that face visually impaired people, especially those with complete loss of vision. As the [...] Read more.
Visually impaired people face many challenges that limit their ability to perform daily tasks and interact with the surrounding world. Navigating around places is one of the biggest challenges that face visually impaired people, especially those with complete loss of vision. As the Internet of Things (IoT) concept starts to play a major role in smart cities applications, visually impaired people can be one of the benefitted clients. In this paper, we propose a smart IoT-based mobile sensors unit that can be attached to an off-the-shelf cane, hereafter a smart cane, to facilitate independent movement for visually impaired people. The proposed mobile sensors unit consists of a six-axis accelerometer/gyro, ultrasonic sensors, GPS sensor, cameras, a digital motion processor and a single credit-card-sized single-board microcomputer. The unit is used to collect information about the cane user and the surrounding obstacles while on the move. An embedded machine learning algorithm is developed and stored in the microcomputer memory to identify the detected obstacles and alarm the user about their nature. In addition, in case of emergencies such as a cane fall, the unit alerts the cane user and their guardian. Moreover, a mobile application is developed to be used by the guardian to track the cane user via Google Maps using a mobile handset to ensure safety. To validate the system, a prototype was developed and tested. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

20 pages, 1700 KB  
Article
Smart Video Surveillance System Based on Edge Computing
by Antonio Carlos Cob-Parro, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Alfredo Gardel-Vicente and Ignacio Bravo-Muñoz
Sensors 2021, 21(9), 2958; https://doi.org/10.3390/s21092958 - 23 Apr 2021
Cited by 52 | Viewed by 11742
Abstract
New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI [...] Read more.
New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI algorithms in low power consumption embedded devices. The computer vision algorithm, typical for surveillance applications, aims to detect, count and track people’s movements in the area. This application requires a distributed smart camera system. The proposed AI application allows detecting people in the surveillance area using a MobileNet-SSD architecture. In addition, using a robust Kalman filter bank, the algorithm can keep track of people in the video also providing people counting information. The detection results are excellent considering the constraints imposed on the process. The selected architecture for the edge node is based on a UpSquared2 device that includes a vision processor unit (VPU) capable of accelerating the AI CNN inference. The results section provides information about the image processing time when multiple video cameras are connected to the same edge node, people detection precision and recall curves, and the energy consumption of the system. The discussion of results shows the usefulness of deploying this smart camera node throughout a distributed surveillance system. Full article
(This article belongs to the Special Issue Applications of Video Processing and Computer Vision Sensor)
Show Figures

Figure 1

14 pages, 673 KB  
Letter
An Evaluation of Low-Cost Vision Processors for Efficient Star Identification
by Surabhi Agarwal, Elena Hervas-Martin, Jonathan Byrne, Aubrey Dunne, Jose Luis Espinosa-Aranda and David Rijlaarsdam
Sensors 2020, 20(21), 6250; https://doi.org/10.3390/s20216250 - 2 Nov 2020
Cited by 11 | Viewed by 4800
Abstract
Star trackers are navigation sensors that are used for attitude determination of a satellite relative to certain stars. A star tracker is required to be accurate and also consume as little power as possible in order to be used in small satellites. While [...] Read more.
Star trackers are navigation sensors that are used for attitude determination of a satellite relative to certain stars. A star tracker is required to be accurate and also consume as little power as possible in order to be used in small satellites. While traditional approaches use lookup tables for identifying stars, the latest advances in star tracking use neural networks for automatic star identification. This manuscript evaluates two low-cost processors capable of running a star identification neural network, the Intel Movidius Myriad 2 Vision Processing Unit (VPU) and the STM32 Microcontroller. The intention of this manuscript is to compare the accuracy and power usage to evaluate the suitability of each device for use in a star tracker. The Myriad 2 VPU and the STM32 Microcontroller have been specifically chosen because of their performance on computer vision algorithms alongside being cost-effective and low power consuming devices. The experimental results showed that the Myriad 2 proved to be efficient and consumed around 1 Watt of power while maintaining 99.08% accuracy with an input including false stars. Comparatively the STM32 was able to deliver comparable accuracy (99.07%) and power measurement results. The proposed experimental setup is beneficial for small spacecraft missions that require low-cost and low power consuming star trackers. Full article
(This article belongs to the Special Issue Attitude Sensors)
Show Figures

Figure 1

17 pages, 5285 KB  
Article
Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip
by Stefania Perri, Cristian Sestito, Fanny Spagnolo and Pasquale Corsonello
J. Imaging 2020, 6(9), 85; https://doi.org/10.3390/jimaging6090085 - 25 Aug 2020
Cited by 7 | Viewed by 3776
Abstract
Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity [...] Read more.
Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources. Full article
Show Figures

Figure 1

24 pages, 3182 KB  
Article
An FPGA Based Tracking Implementation for Parkinson’s Patients
by Giuseppe Conti, Marcos Quintana, Pedro Malagón and David Jiménez
Sensors 2020, 20(11), 3189; https://doi.org/10.3390/s20113189 - 4 Jun 2020
Cited by 8 | Viewed by 4415
Abstract
This paper presents a study on the optimization of the tracking system designed for patients with Parkinson’s disease tested at a day hospital center. The work performed significantly improves the efficiency of the computer vision based system in terms of energy consumption and [...] Read more.
This paper presents a study on the optimization of the tracking system designed for patients with Parkinson’s disease tested at a day hospital center. The work performed significantly improves the efficiency of the computer vision based system in terms of energy consumption and hardware requirements. More specifically, it optimizes the performances of the background subtraction by segmenting every frame previously characterized by a Gaussian mixture model (GMM). This module is the most demanding part in terms of computation resources, and therefore, this paper proposes a method for its implementation by means of a low-cost development board based on Zynq XC7Z020 SoC (system on chip). The platform used is the ZedBoard, which combines an ARM Processor unit and a FPGA. It achieves real-time performance and low power consumption while performing the target request accurately. The results and achievements of this study, validated in real medical settings, are discussed and analyzed within. Full article
(This article belongs to the Special Issue Sensors and Sensing Technology Applied in Parkinson Disease)
Show Figures

Figure 1

20 pages, 20221 KB  
Article
Open Vision System for Low-Cost Robotics Education
by Julio Vega and José M. Cañas
Electronics 2019, 8(11), 1295; https://doi.org/10.3390/electronics8111295 - 6 Nov 2019
Cited by 16 | Viewed by 6129
Abstract
Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually [...] Read more.
Vision devices are currently one of the most widely used sensory elements in robots: commercial autonomous cars and vacuum cleaners, for example, have cameras. These vision devices can provide a great amount of information about robot surroundings. However, platforms for robotics education usually lack such devices, mainly because of the computing limitations of low cost processors. New educational platforms using Raspberry Pi are able to overcome this limitation while keeping costs low, but extracting information from the raw images is complex for children. This paper presents an open source vision system that simplifies the use of cameras in robotics education. It includes functions for the visual detection of complex objects and a visual memory that computes obstacle distances beyond the small field of view of regular cameras. The system was experimentally validated using the PiCam camera mounted on a pan unit on a Raspberry Pi-based robot. The performance and accuracy of the proposed vision system was studied and then used to solve two visual educational exercises: safe visual navigation with obstacle avoidance and person-following behavior. Full article
(This article belongs to the Special Issue Advanced Embedded HW/SW Development)
Show Figures

Figure 1

19 pages, 8083 KB  
Article
FPGA-Based HD Camera System for the Micropositioning of Biomedical Micro-Objects Using a Contactless Micro-Conveyor
by Elmar Yusifli, Reda Yahiaoui, Saeed Mian Qaisar, Mahmoud Addouche, Basil Al-Mahdawi, Hicham Bourouina, Guillaume Herlem and Tijani Gharbi
Micromachines 2017, 8(3), 74; https://doi.org/10.3390/mi8030074 - 2 Mar 2017
Cited by 5 | Viewed by 8114
Abstract
With recent advancements, micro-object contactless conveyers are becoming an essential part of the biomedical sector. They help avoid any infection and damage that can occur due to external contact. In this context, a smart micro-conveyor is devised. It is a Field Programmable Gate [...] Read more.
With recent advancements, micro-object contactless conveyers are becoming an essential part of the biomedical sector. They help avoid any infection and damage that can occur due to external contact. In this context, a smart micro-conveyor is devised. It is a Field Programmable Gate Array (FPGA)-based system that employs a smart surface for conveyance along with an OmniVision complementary metal-oxide-semiconductor (CMOS) HD camera for micro-object position detection and tracking. A specific FPGA-based hardware design and VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) implementation are realized. It is done without employing any Nios processor or System on a Programmable Chip (SOPC) builder based Central Processing Unit (CPU) core. It keeps the system efficient in terms of resource utilization and power consumption. The micro-object positioning status is captured with an embedded FPGA-based camera driver and it is communicated to the Image Processing, Decision Making and Command (IPDC) module. The IPDC is programmed in C++ and can run on a Personal Computer (PC) or on any appropriate embedded system. The IPDC decisions are sent back to the FPGA, which pilots the smart surface accordingly. In this way, an automated closed-loop system is employed to convey the micro-object towards a desired location. The devised system architecture and implementation principle is described. Its functionality is also verified. Results have confirmed the proper functionality of the developed system, along with its outperformance compared to other solutions. Full article
(This article belongs to the Special Issue Medical Microdevices and Micromachines)
Show Figures

Figure 1

18 pages, 684 KB  
Article
Parallel Computational Intelligence-Based Multi-Camera Surveillance System
by Sergio Orts-Escolano, Jose Garcia-Rodriguez, Vicente Morell, Miguel Cazorla, Jorge Azorin and Juan Manuel Garcia-Chamizo
J. Sens. Actuator Netw. 2014, 3(2), 95-112; https://doi.org/10.3390/jsan3020095 - 11 Apr 2014
Cited by 6 | Viewed by 10177
Abstract
In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such [...] Read more.
In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture. Full article
Show Figures

Graphical abstract

Back to TopTop