Special Issue "Image Processing Using FPGAs"

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (30 November 2018)

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor

Guest Editor
Prof. Donald Bailey

School of Engineering and Advanced Technology, Massey University, Palmerston North 4442, New Zealand
Website | E-Mail
Interests: machine vision; FPGA based design; digital image processing

Special Issue Information

Dear Colleagues,

Field Programmable Gate Arrays (FPGAs) are increasingly being used for the implementation of image processing applications. This is especially the case for real-time embedded applications, where latency and power are important consideration. An FPGA embedded in a smart camera is able to perform much of the image processing directly as the image is streamed from the sensor, providing a processed data stream, rather than images. The parallelism of hardware is able to exploit the spatial and temporal parallelism implicit within many image processing tasks. Unfortunately, simply porting a software algorithm onto an FPGA often gives disappointing results, because many image processing algorithms have been optimised for a serial processor. It is usually necessary to transform the algorithm to efficiently exploit the parallelism and resources available on an FPGA. This can lead to novel algorithms and hardware computational architectures, both at the image processing operation level and also the application level.

The aim of this Special Issue is to present and highlight novel algorithms, architectures, techniques and applications of FPGAs for image processing.

Prof. Donald Bailey
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 350 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Hardware algorithms for imaging
  • Computational imaging architectures
  • Reconfigurable image processing systems
  • Parallel image processing
  • Hardware acceleration for imaging applications
  • FPGA based smart cameras

Published Papers (10 papers)

View options order results:
result details:
Displaying articles 1-10
Export citation of selected articles as:

Editorial

Jump to: Research

Open AccessEditorial
Image Processing Using FPGAs
J. Imaging 2019, 5(5), 53; https://doi.org/10.3390/jimaging5050053
Received: 6 May 2019 / Revised: 7 May 2019 / Accepted: 7 May 2019 / Published: 10 May 2019
PDF Full-text (169 KB) | HTML Full-text | XML Full-text
Abstract
Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor [...] Read more.
Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor design to reduce the programming burden; memory management for full frames, line buffers, and image border management; image segmentation through background modelling, online K-means clustering, and generalised Laplacian of Gaussian filtering; connected components analysis; and visually lossless image compression. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available

Research

Jump to: Editorial

Open AccessArticle
A JND-Based Pixel-Domain Algorithm and Hardware Architecture for Perceptual Image Coding
J. Imaging 2019, 5(5), 50; https://doi.org/10.3390/jimaging5050050
Received: 29 March 2019 / Accepted: 16 April 2019 / Published: 26 April 2019
Cited by 1 | PDF Full-text (2873 KB) | HTML Full-text | XML Full-text
Abstract
This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling [...] Read more.
This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling and predictive coding. The downsampling is performed adaptively on the input image based on regions-of-interest (ROIs) identified by measuring the downsampling distortions against the visibility thresholds given by the JND model. The coding error at any pixel location can be guaranteed to be within the corresponding JND threshold in order to obtain excellent visual quality. Experimental results show the improved accuracy of the proposed JND model in estimating visual redundancies compared with classic JND models published earlier. Compression experiments demonstrate improved rate-distortion performance and visual quality over JPEG-LS as well as reduced compressed bit rates compared with other standard codecs such as JPEG 2000 at the same peak signal-to-perceptible-noise ratio (PSPNR). FPGA synthesis results targeting a mid-range device show very moderate hardware resource requirements and over 100 Megapixel/s throughput of both the JND model and the perceptual encoder. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
Zig-Zag Based Single-Pass Connected Components Analysis
J. Imaging 2019, 5(4), 45; https://doi.org/10.3390/jimaging5040045
Received: 2 February 2019 / Revised: 20 March 2019 / Accepted: 29 March 2019 / Published: 6 April 2019
Cited by 1 | PDF Full-text (659 KB) | HTML Full-text | XML Full-text
Abstract
Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains [...] Read more.
Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains of labels to be correctly resolved while processing the next image row. The effect is faster processing in the worst case with no end of row overheads. CCA hardware architectures using the novel algorithm proposed in this paper are, therefore, able to process images at higher throughput than other state-of-the-art methods while reducing the hardware requirements. The latency introduced by the conversion from raster scan to zig-zag scan is compensated for by a new method of detecting object completion, which enables the feature vector for completed connected components to be output at the earliest possible opportunity. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
High-Level Synthesis of Online K-Means Clustering Hardware for a Real-Time Image Processing Pipeline
J. Imaging 2019, 5(3), 38; https://doi.org/10.3390/jimaging5030038
Received: 29 November 2018 / Revised: 6 March 2019 / Accepted: 7 March 2019 / Published: 14 March 2019
Cited by 1 | PDF Full-text (5883 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and [...] Read more.
The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and other advanced computer vision algorithms. Thus, the inclusion of a hardware accelerator for this task in the conventional image processing pipeline inevitably reduces the workload for more advanced operations downstream. Moreover, design entry by using high-level synthesis tools is gaining popularity for the facilitation of system development under a rapid prototyping paradigm. To address these design requirements, we have developed a hardware accelerator for image segmentation, based on an online K-Means algorithm using a Simulink high-level synthesis tool. The developed hardware uses a standard pixel streaming protocol, and it can be readily inserted into any image processing pipeline as an Intellectual Property (IP) core on a Field Programmable Gate Array (FPGA). Furthermore, the proposed design reduces the hardware complexity of the conventional architectures by employing a weighted instead of a moving average to update the clusters. Experimental evidence has also been provided to demonstrate that the proposed weighted average-based approach yields better results than the conventional moving average on test video sequences. The synthesized hardware has been tested in real-time environment to process Full HD video at 26.5 fps, while the estimated dynamic power consumption is less than 90 mW on the Xilinx Zynq-7000 SOC. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing
J. Imaging 2019, 5(3), 34; https://doi.org/10.3390/jimaging5030034
Received: 21 January 2019 / Revised: 25 February 2019 / Accepted: 25 February 2019 / Published: 6 March 2019
Cited by 1 | PDF Full-text (1993 KB) | HTML Full-text | XML Full-text
Abstract
Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to [...] Read more.
Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to 56 % of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of 329.5 MHz and reduction in power consumption by 45.7 % comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM’s performance, where a pixel throughput of 4.5 Giga Pixel/s and 4.2 Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to 30 × . Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
Efficient FPGA Implementation of Automatic Nuclei Detection in Histopathology Images
J. Imaging 2019, 5(1), 21; https://doi.org/10.3390/jimaging5010021
Received: 30 November 2018 / Revised: 27 December 2018 / Accepted: 11 January 2019 / Published: 17 January 2019
Cited by 1 | PDF Full-text (6972 KB) | HTML Full-text | XML Full-text
Abstract
Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and [...] Read more.
Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and software implementation requires a significant amount of processing time. To assist the doctors in real time, special hardware accelerators, which can reduce the processing time, are required. In this paper, we propose a Field Programmable Gate Array (FPGA) implementation of automated nuclei detection algorithm using generalized Laplacian of Gaussian filters. The experimental results show that the implemented architecture has the potential to provide a significant improvement in processing time without losing detection accuracy. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
FPGA-Based Processor Acceleration for Image Processing Applications
J. Imaging 2019, 5(1), 16; https://doi.org/10.3390/jimaging5010016
Received: 27 November 2018 / Revised: 23 December 2018 / Accepted: 7 January 2019 / Published: 13 January 2019
Cited by 1 | PDF Full-text (2370 KB) | HTML Full-text | XML Full-text
Abstract
FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a [...] Read more.
FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing
J. Imaging 2019, 5(1), 7; https://doi.org/10.3390/jimaging5010007
Received: 19 November 2018 / Revised: 24 December 2018 / Accepted: 27 December 2018 / Published: 1 January 2019
Cited by 1 | PDF Full-text (2876 KB) | HTML Full-text | XML Full-text
Abstract
Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size [...] Read more.
Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size and power constraints. In this paper, we investigate allocation of on-chip memory resources in order to minimize resource usage and power consumption, contributing to the realization of power-efficient high-level image processing fully contained on FPGAs. We propose methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption. Based on a formalization of on-chip memory configuration options and a power model, we demonstrate how our partitioning algorithms can outperform traditional strategies. Compared to commercial FPGA synthesis and High Level Synthesis tools, our results show that the proposed algorithms can result in up to 60% higher utilization efficiency, increasing the sizes and/or number of frames that can be accommodated, and reduce frame buffers’ dynamic power consumption by up to approximately 70%. In our experiments using Optical Flow and MeanShift Tracking, representative high-level algorithms, data show that partitioning algorithms can reduce total power by up to 25% and 30%, respectively, without impacting performance. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
Border Handling for 2D Transpose Filter Structures on an FPGA
J. Imaging 2018, 4(12), 138; https://doi.org/10.3390/jimaging4120138
Received: 31 October 2018 / Revised: 20 November 2018 / Accepted: 21 November 2018 / Published: 26 November 2018
Cited by 1 | PDF Full-text (509 KB) | HTML Full-text | XML Full-text
Abstract
It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel [...] Read more.
It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel mechanisms: transformation coalescing, and combination chain modification. For linear filters, coefficient coalescing can effectively exploit the digital signal processing blocks, resulting in the smallest resources requirements. Combination chain modification requires similar resources to direct-form border handling. It is demonstrated that the combination chain multiplexing can be split into two stages, consisting of a combination network followed by the transpose-form combination chain. The resulting transpose-form border handling networks are of similar complexity to the direct-form networks, enabling the transpose-form filter structure to be used where required. The transpose form is also significantly faster, being automatically pipelined by the filter structure. Of the border extension methods, zero-extension requires the least resources. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

Open AccessArticle
Accelerating SuperBE with Hardware/Software Co-Design
J. Imaging 2018, 4(10), 122; https://doi.org/10.3390/jimaging4100122
Received: 11 September 2018 / Revised: 29 September 2018 / Accepted: 16 October 2018 / Published: 18 October 2018
Cited by 1 | PDF Full-text (1830 KB) | HTML Full-text | XML Full-text
Abstract
Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall [...] Read more.
Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall computation time. This is especially important in the context of embedded systems like smart cameras, which may need to process images with constrained computational resources. This work focuses on accelerating SuperBE, a superpixel-based background estimation algorithm that was designed for simplicity and reducing computational complexity while maintaining state-of-the-art levels of accuracy. We explore both software and hardware acceleration opportunities, converting the original algorithm into a greyscale, integer-only version, and using Hardware/Software Co-design to develop hardware acceleration components on FPGA fabric that assist a software processor. We achieved a 4.4× speed improvement with the software optimisations alone, and a 2× speed improvement with the hardware optimisations alone. When combined, these led to a 9× speed improvement on a Cyclone V System-on-Chip, delivering almost 38 fps on 320 × 240 resolution images. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs) Printed Edition available
Figures

Figure 1

J. Imaging EISSN 2313-433X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top