Special Issue "Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 31 October 2019.

Special Issue Editor

Guest Editor
Prof. Dr. Dah-Jye Lee

Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602, USA
Website | E-Mail
Interests: object recognition, hardware friendly computer vision algorithms, real-time robotic vision applications

Special Issue Information

Dear Colleagues,

Processing speed is critical for many visual computing tasks. Many computer vision algorithms generate accurate results, but run too slowly to produce results in real time. On the other hand, some algorithms process at camera frame rates, but with reduced accuracy, a more useful combination for real-time applications.  Moreover, FPGAs are increasing in capacity and decreasing in power consumption, making them more attractive for embedded applications, such as onboard vision and control for unmanned vehicles. Convolutional neural networks (CNNs) offer state-of-the-art accuracy for many computer vision tasks.  Their capabilities are generalizable to many different real-world applications. Real-world applications often require real-time responsiveness from the vision system. This Special Issue focuses on CNNs and their application to real-time computer vision tasks.

General topics covered in this Special Issue include, but are not limited to:

  • FPGA-based hardware acceleration of vision algorithms;
  • GPU-based acceleration of vision algorithms;
  • Embedded vision sensors for applications that require real-time performance;
  • CNN architecture optimizations for real-time performance;
  • CNN acceleration through approximate computing;
  • GPU-based implementations for real-time CNN performance;
  • FPGA-based implementations for real-time CNN performance;
  • Real-time CNN performance on resource limited systems;
  • CNN applications that require real-time performance;
  • Tradeoff analysis between speed and accuracy in CNNs.

Prof. Dr. Dah-Jye Lee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (8 papers)

View options order results:
result details:
Displaying articles 1-8
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle
Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs
Electronics 2019, 8(7), 803; https://doi.org/10.3390/electronics8070803
Received: 29 May 2019 / Revised: 6 July 2019 / Accepted: 9 July 2019 / Published: 18 July 2019
PDF Full-text (1194 KB) | HTML Full-text | XML Full-text
Abstract
Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in [...] Read more.
Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency. Full article
Figures

Figure 1

Open AccessFeature PaperArticle
Jet Features: Hardware-Friendly, Learned Convolutional Kernels for High-Speed Image Classification
Electronics 2019, 8(5), 588; https://doi.org/10.3390/electronics8050588
Received: 4 May 2019 / Revised: 21 May 2019 / Accepted: 22 May 2019 / Published: 27 May 2019
Cited by 1 | PDF Full-text (3477 KB) | HTML Full-text | XML Full-text
Abstract
This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be [...] Read more.
This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be used in machine learning algorithms. Using Jet Features, we make significant improvements on our previous work, the Evolution Constructed Features (ECO Features) algorithm. Not only do we gain a 3.7× speedup in software without loosing any accuracy on the CIFAR-10 and MNIST datasets, but Jet Features also allow us to implement the algorithm in an FPGA using only a fraction of its resources. We hope to apply the benefits of Jet Features to Convolutional Neural Networks in the future. Full article
Figures

Figure 1

Open AccessArticle
An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks
Electronics 2019, 8(4), 371; https://doi.org/10.3390/electronics8040371
Received: 25 February 2019 / Revised: 15 March 2019 / Accepted: 22 March 2019 / Published: 27 March 2019
Cited by 1 | PDF Full-text (3964 KB) | HTML Full-text | XML Full-text
Abstract
Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with [...] Read more.
Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of 0.17 mm 2 . It can achieve 7.03 TOPS/W energy efficiency and 4.14 TOPS/mm 2 area efficiency at 100.1 mW, which makes it a promising design for the embedded devices. Full article
Figures

Figure 1

Open AccessArticle
Optimized Compression for Implementing Convolutional Neural Networks on FPGA
Electronics 2019, 8(3), 295; https://doi.org/10.3390/electronics8030295
Received: 30 January 2019 / Revised: 24 February 2019 / Accepted: 1 March 2019 / Published: 6 March 2019
Cited by 4 | PDF Full-text (2318 KB) | HTML Full-text | XML Full-text
Abstract
Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes [...] Read more.
Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs). Full article
Figures

Figure 1

Open AccessArticle
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
Electronics 2019, 8(3), 281; https://doi.org/10.3390/electronics8030281
Received: 29 December 2018 / Revised: 8 February 2019 / Accepted: 23 February 2019 / Published: 3 March 2019
Cited by 1 | PDF Full-text (5742 KB) | HTML Full-text | XML Full-text
Abstract
The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great [...] Read more.
The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs. Full article
Figures

Figure 1

Open AccessArticle
Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method
Electronics 2019, 8(1), 105; https://doi.org/10.3390/electronics8010105
Received: 21 December 2018 / Revised: 10 January 2019 / Accepted: 16 January 2019 / Published: 18 January 2019
Cited by 3 | PDF Full-text (4563 KB) | HTML Full-text | XML Full-text
Abstract
Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has [...] Read more.
Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has aroused interest. In this work, we address the computing-resource-saving problem by developing a deep model, termed the Gabor convolutional neural network (Gabor CNN), which incorporates highly expression-efficient Gabor kernels into CNNs. In order to effectively imitate the structural characteristics of traditional weight kernels, we improve upon the traditional Gabor filters, having stronger frequency and orientation representations. In addition, we propose a procedure to train Gabor CNNs, termed the fast training method (FTM). In FTM, we design a new training method based on the multipopulation genetic algorithm (MPGA) and evaluation structure to optimize improved Gabor kernels, but train the rest of the Gabor CNN parameters with back-propagation. The training of improved Gabor kernels with MPGA is much more energy-efficient with less samples and iterations. Simple tasks, like character recognition on the Mixed National Institute of Standards and Technology database (MNIST), traffic sign recognition on the German Traffic Sign Recognition Benchmark (GTSRB), and face detection on the Olivetti Research Laboratory database (ORL), are implemented using LeNet architecture. The experimental result of the Gabor CNN and MPGA training method shows a 17–19% reduction in computational energy and time and an 18–21% reduction in storage requirements with a less than 1% accuracy decrease. We eliminated a significant fraction of the computation-hungry components in the training process by incorporating highly expression-efficient Gabor kernels into CNNs. Full article
Figures

Figure 1

Open AccessArticle
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs
Electronics 2019, 8(1), 65; https://doi.org/10.3390/electronics8010065
Received: 3 December 2018 / Revised: 17 December 2018 / Accepted: 3 January 2019 / Published: 7 January 2019
Cited by 2 | PDF Full-text (752 KB) | HTML Full-text | XML Full-text
Abstract
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design [...] Read more.
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. Full article
Figures

Figure 1

Review

Jump to: Research

Open AccessFeature PaperReview
A Review of Binarized Neural Networks
Electronics 2019, 8(6), 661; https://doi.org/10.3390/electronics8060661
Received: 14 May 2019 / Revised: 3 June 2019 / Accepted: 5 June 2019 / Published: 12 June 2019
PDF Full-text (376 KB) | HTML Full-text | XML Full-text
Abstract
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes [...] Read more.
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet. BNNs are also good candidates for deep learning implementations on FPGAs and ASICs due to their bitwise efficiency. We give a tutorial of the general BNN methodology and review various contributions, implementations and applications of BNNs. Full article
Figures

Figure 1

Electronics EISSN 2079-9292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top