Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN
) and sequential (Vitis AI
). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT
datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs
implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN
tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI
tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2
development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN
. The same networks implemented with Vitis AI
achieved 123.3 fps and 53.3 fps, respectively.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.