In the implementation process of a convolution neural network (CNN)-based object detection system, the primary issues are power dissipation and limited throughput. Even though we utilize ultra-low power dissipation devices, the dynamic power dissipation issue will be difficult to resolve. During the operation of the CNN algorithm, there are several factors such as the heating problem generated from the massive computational complexity, the bottleneck generated in data transformation and by the limited bandwidth, and the power dissipation generated from redundant data access. This article proposes the low-power techniques, applies them to the CNN accelerator on the FPGA and ASIC design flow, and evaluates them on the Xilinx ZCU-102 FPGA SoC hardware platform and 45 nm technology for ASIC, respectively. Our proposed low-power techniques are applied at the register-transfer-level (RT-level), targeting FPGA and ASIC. In this article, we achieve up to a 53.21% power reduction in the ASIC implementation and saved 32.72% of the dynamic power dissipation in the FPGA implementation. This shows that our RTL low-power schemes have a powerful possibility of dynamic power reduction when applied to the FPGA design flow and ASIC design flow for the implementation of the CNN-based object detection system.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited