Next Article in Journal
LOS-Based Equal Gain Transmission and Combining in General Frequency-Selective Ricean Massive MIMO Channels
Previous Article in Journal
An Extended Approach for Validation and Optimization of Position Sensor Signal Processing in Electric Drive Trains
Article Menu
Issue 1 (January) cover image

Export Article

Open AccessArticle
Electronics 2019, 8(1), 78; https://doi.org/10.3390/electronics8010078

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

1
School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China
2
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
3
School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden
*
Author to whom correspondence should be addressed.
Received: 23 November 2018 / Revised: 16 December 2018 / Accepted: 5 January 2019 / Published: 10 January 2019
(This article belongs to the Section Computer Science & Engineering)
Full-Text   |   PDF [918 KB, uploaded 10 January 2019]   |  

Abstract

As a key ingredient of deep neural networks (DNNs), fully-connected (FC) layers are widely used in various artificial intelligence applications. However, there are many parameters in FC layers, so the efficient process of FC layers is restricted by memory bandwidth. In this paper, we propose a compression approach combining block-circulant matrix-based weight representation and power-of-two quantization. Applying block-circulant matrices in FC layers can reduce the storage complexity from O ( k 2 ) to O ( k ) . By quantizing the weights into integer powers of two, the multiplications in the reference can be replaced by shift and add operations. The memory usages of models for MNIST, CIFAR-10 and ImageNet can be compressed by 171 × , 2731 × and 128 × with minimal accuracy loss, respectively. A configurable parallel hardware architecture is then proposed for processing the compressed FC layers efficiently. Without multipliers, a block matrix-vector multiplication module (B-MV) is used as the computing kernel. The architecture is flexible to support FC layers of various compression ratios with small footprint. Simultaneously, the memory access can be significantly reduced by using the configurable architecture. Measurement results show that the accelerator has a processing power of 409.6 GOPS, and achieves 5.3 TOPS/W energy efficiency at 800 MHz. View Full-Text
Keywords: hardware acceleration; deep neural networks (DNNs); fully-connected layers; network compression; VLSI hardware acceleration; deep neural networks (DNNs); fully-connected layers; network compression; VLSI
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Qin, Z.; Zhu, D.; Zhu, X.; Chen, X.; Shi, Y.; Gao, Y.; Lu, Z.; Shen, Q.; Li, L.; Pan, H. Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights. Electronics 2019, 8, 78.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Electronics EISSN 2079-9292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top