Next Article in Journal
Characteristic Study of Visible Light Communication and Influence of Coal Dust Particles in Underground Coal Mines
Previous Article in Journal
Power Conversion Using Analytical Model of Cockcroft–Walton Voltage Multiplier Rectenna
 
 
Article

PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture

by 1,2, 1,2,3,*, 1,2,3 and 1
1
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China
3
Shandong Industrial Institute of Integrated Circuits Technology Ltd., Jinan 250001, China
*
Author to whom correspondence should be addressed.
Academic Editor: Chiman Kwan
Electronics 2021, 10(8), 882; https://doi.org/10.3390/electronics10080882
Received: 3 March 2021 / Revised: 26 March 2021 / Accepted: 3 April 2021 / Published: 8 April 2021
(This article belongs to the Section Circuit and Signal Processing)
Pruning and quantization are two commonly used approaches to accelerate the LSTM (Long Short-Term Memory) model. However, the traditional linear quantization usually suffers from the problem of gradient vanishing, and the existing pruning methods all have the problem of producing undesired irregular sparsity or large indexing overhead. To alleviate the problem of vanishing gradient, this work proposed a normalized linear quantization approach, which first normalize operands regionally and then quantize them in a local mix-max range. To overcome the problem of irregular sparsity and large indexing overhead, this work adopts the permuted block diagonal mask matrices to generate the sparse model. Due to the sparse model being highly regular, the position of non-zero weights can be obtained by a simple calculation, thus avoiding the large indexing overhead. Based on the sparse LSTM model generated from the permuted block diagonal mask matrices, this paper also proposed a high energy-efficiency accelerator, PermLSTM that comprehensively exploits the sparsity of weights, activations, and products regarding the matrix–vector multiplications, resulting in a 55.1% reduction in power consumption. The accelerator has been realized on Arria-10 FPGAs running at 150 MHz and achieved 2.19×24.4× energy efficiency compared with the other FPGA-based LSTM accelerators previously reported. View Full-Text
Keywords: LSTM; pruning; quantization; sparse matrix–vector multiplication LSTM; pruning; quantization; sparse matrix–vector multiplication
Show Figures

Figure 1

MDPI and ACS Style

Zheng, Y.; Yang, H.; Jia, Y.; Huang, Z. PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture. Electronics 2021, 10, 882. https://doi.org/10.3390/electronics10080882

AMA Style

Zheng Y, Yang H, Jia Y, Huang Z. PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture. Electronics. 2021; 10(8):882. https://doi.org/10.3390/electronics10080882

Chicago/Turabian Style

Zheng, Yong, Haigang Yang, Yiping Jia, and Zhihong Huang. 2021. "PermLSTM: A High Energy-Efficiency LSTM Accelerator Architecture" Electronics 10, no. 8: 882. https://doi.org/10.3390/electronics10080882

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop