Next Article in Journal
An Efficient CMOS Dual Switch Rectifier for Piezoelectric Energy-Harvesting Circuits
Next Article in Special Issue
Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method
Previous Article in Journal
An Oscillatory Neural Network Based Local Processing Unit for Pattern Recognition Applications
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle
Electronics 2019, 8(1), 65; https://doi.org/10.3390/electronics8010065

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

1
National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha 410073, China
2
Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada
*
Author to whom correspondence should be addressed.
Received: 3 December 2018 / Revised: 17 December 2018 / Accepted: 3 January 2019 / Published: 7 January 2019
Full-Text   |   PDF [752 KB, uploaded 7 January 2019]   |  

Abstract

Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. View Full-Text
Keywords: 2D CNN; 3D CNN; accelerator; uniform architecture; FPGA; HLS; matrix multiplication; 2D MAC array 2D CNN; 3D CNN; accelerator; uniform architecture; FPGA; HLS; matrix multiplication; 2D MAC array
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Liu, Z.; Chow, P.; Xu, J.; Jiang, J.; Dou, Y.; Zhou, J. A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs. Electronics 2019, 8, 65.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Electronics EISSN 2079-9292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top