Next Article in Journal
A Tree-Based Architecture for High-Performance Ultra-Low-Voltage Amplifiers
Next Article in Special Issue
Low-Power Deep Learning Model for Plant Disease Detection for Smart-Hydroponics Using Knowledge Distillation Techniques
Previous Article in Journal / Special Issue
Mapping Transformation Enabled High-Performance and Low-Energy Memristor-Based DNNs
 
 
Article

DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit †

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in MCSoC, Z. Bao, J. Guo, X. Li and W. Zhang, “MSCU: Accelerating CNN Inference with Multiple Sizes of Compute Unit on FPGAs.” In Proceedings of the 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 2021; pp. 106–113, doi:10.1109/MCSoC51149.2021.00023.
Academic Editor: Stefania Perri
J. Low Power Electron. Appl. 2022, 12(1), 11; https://doi.org/10.3390/jlpea12010011
Received: 30 December 2021 / Revised: 28 January 2022 / Accepted: 10 February 2022 / Published: 13 February 2022
(This article belongs to the Special Issue Low Power AI)
FPGA-based accelerators have shown great potential in improving the performance of CNN inference. However, the existing FPGA-based approaches suffer from a low compute unit (CU) efficiency due to their large number of redundant computations, thus leading to high levels of performance degradation. In this paper, we show that no single CU can perform best across all the convolutional layers (CONV-layers). To this end, we propose the use of dual sizes of compute unit (DSCU), an approach that aims to accelerate CNN inference in FPGAs. The key idea of DSCU is to select the best combination of CUs via dynamic programming scheduling for each CONV-layer and then assemble each CONV-layer combination into a computing solution for the given CNN to deploy in FPGAs. The experimental results show that DSCU can achieve a performance density of 3.36 × 103 GOPs/slice on a Xilinx Zynq ZU3EG, which is 4.29 times higher than that achieved by other approaches. View Full-Text
Keywords: FPGA; redundant computation; dynamic programming FPGA; redundant computation; dynamic programming
Show Figures

Figure 1

MDPI and ACS Style

Bao, Z.; Guo, J.; Zhang, W.; Dang, H. DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit. J. Low Power Electron. Appl. 2022, 12, 11. https://doi.org/10.3390/jlpea12010011

AMA Style

Bao Z, Guo J, Zhang W, Dang H. DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit. Journal of Low Power Electronics and Applications. 2022; 12(1):11. https://doi.org/10.3390/jlpea12010011

Chicago/Turabian Style

Bao, Zhenshan, Junnan Guo, Wenbo Zhang, and Hongbo Dang. 2022. "DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit" Journal of Low Power Electronics and Applications 12, no. 1: 11. https://doi.org/10.3390/jlpea12010011

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop