Next Article in Journal / Special Issue
The Importance of Introducing the OCTC Method to Undergraduate Students as a Tool for Circuit Analysis and Amplifier Design
Previous Article in Journal / Special Issue
SET Pulse Characterization and SER Estimation in Combinational Logic with Placement and Multiple Transient Faults Considerations
Open AccessArticle

A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines

School of Electrical & Computer Engineering, National Technical University of Athens, Athens 15780, Greece
*
Author to whom correspondence should be addressed.
Technologies 2020, 8(1), 6; https://doi.org/10.3390/technologies8010006
Received: 16 December 2019 / Revised: 7 January 2020 / Accepted: 9 January 2020 / Published: 13 January 2020
(This article belongs to the Special Issue MOCAST 2019: Modern Circuits and Systems Technologies on Electronics)
The workloads of Convolutional Neural Networks (CNNs) exhibit a streaming nature that makes them attractive for reconfigurable architectures such as the Field-Programmable Gate Arrays (FPGAs), while their increased need for low-power and speed has established Application-Specific Integrated Circuit (ASIC)-based accelerators as alternative efficient solutions. During the last five years, the development of Hardware Description Language (HDL)-based CNN accelerators, either for FPGA or ASIC, has seen huge academic interest due to their high-performance and room for optimizations. Towards this direction, we propose a library-based framework, which extends TensorFlow, the well-established machine learning framework, and automatically generates high-throughput CNN inference engines for FPGAs and ASICs. The framework allows software developers to exploit the benefits of FPGA/ASIC acceleration without requiring any expertise on HDL development and low-level design. Moreover, it provides a set of optimization knobs concerning the model architecture and the inference engine generation, allowing the developer to tune the accelerator according to the requirements of the respective use case. Our framework is evaluated by optimizing the LeNet CNN model on the MNIST dataset, and implementing FPGA- and ASIC-based accelerators using the generated inference engine. The optimal FPGA-based accelerator on Zynq-7000 delivers 93% less memory footprint and 54% less Look-Up Table (LUT) utilization, and up to 10× speedup on the inference execution vs. different Graphics Processing Unit (GPU) and Central Processing Unit (CPU) implementations of the same model, in exchange for a negligible accuracy loss, i.e., 0.89%. For the same accuracy drop, the 45 nm standard-cell-based ASIC accelerator provides an implementation which operates at 520 MHz and occupies an area of 0.059 mm 2 , while the power consumption is ∼7.5 mW. View Full-Text
Keywords: machine learning; convolutional neural networks; model optimizations; weight quantization; inference engine optimizations; dataflow optimizations; tensorflow; FPGA; ASIC machine learning; convolutional neural networks; model optimizations; weight quantization; inference engine optimizations; dataflow optimizations; tensorflow; FPGA; ASIC
Show Figures

Figure 1

MDPI and ACS Style

Leon, V.; Mouselinos, S.; Koliogeorgi, K.; Xydis, S.; Soudris, D.; Pekmestzi, K. A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines. Technologies 2020, 8, 6.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop