Design of a Hardware-Optimized High-Performance CNN Accelerator for Real-Time Object Detection Using YOLOv3 with Darknet-19 Architecture
Abstract
1. Introduction
1.1. Background
1.2. Motivation
1.3. Literature Review
1.4. Contribution
2. Problem Formulation
2.1. Problem Statement
2.2. Objectives
3. Convolution Neural Network Model and Darknet-19 Structure
3.1. Convolution Layer
3.2. Pooling Layer
3.3. Activation Layer
3.4. Fully Connected Layer
3.5. Padding Layer
3.6. Darknet-19 Structure
3.7. Batch Normalization
3.8. YOLOv3 with Darknet-19
4. Proposed Hardware Architecture
5. Tools and Libraries Used
5.1. Xilinx Vivado
5.2. Synopsys Design Compiler
5.3. Cadence Innovus
5.4. MATLAB
5.5. Nangate Open Cell Library
6. ASIC Implementation
6.1. Functional Simulation
6.2. RTL Schematic
6.3. Gate-Level Synthesis
6.3.1. Area Optimization
6.3.2. Power Optimization
6.3.3. Timing Analysis
6.4. Physical Implementation
7. Experimental Results and Performance Analysis
7.1. Experimental Setup
7.2. Generation of Padded Image
7.3. Verifying Functionality of CNN Accelerator
7.4. System Performance Metrics
7.5. Power and Area of Processing Element
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Trinchero, R.; Manfredi, P.; Stievano, I.S.; Canavero, F.G. Machine learning for the performance assessment of high-speed links. IEEE Trans. Electromagn. Compat. 2018, 60, 1627–1634. [Google Scholar] [CrossRef]
- Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Pearson Education: Hong Kong, China, 2018; ISBN 978-1-292-22304-9. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Zhao, M.; Li, X.; Zhu, S.; Zhou, L. A Method for Accelerating Convolutional Neural Networks Based on FPGA. In Proceedings of the 2019 4th International Conference on Communication and Information Systems (ICCIS), Wuhan, China, 19–21 December 2019; pp. 241–246. [Google Scholar] [CrossRef]
- Veena, M.B.; Deodurg, R.; Shrinidhi, V.; Soundarya, S. Design of Optimized CNN for Image Processing using Verilog. In Proceedings of the 2023 4th IEEE Global Conference for Advancement in Technology (GCAT), Bangalore, India, 6–8 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Karapurkar, S.S.; Bramhane, L.K.; Rahulkar, A.D.; Veerakumar, T. Energy-Efficient Implementation of Processing Elements for CNN Hardware Accelerator. In Proceedings of the 2023 11th International Conference on Emerging Trends in Engineering & Technology-Signal and Information Processing (ICETET-SIP), Goa, India, 28–29 April 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Crumley, D.; Hossain, M.; Martin, K.; Ivey, F.; Yarnell, R.; DeMara, R.F.; Bai, Y. Rehosting YOLOv2 Framework for Reconfigurable Fabric-Based Acceleration. In Proceedings of the 2022 IEEE SoutheastCon, Mobile, AL, USA, 26 March–3 April 2022; pp. 445–446. [Google Scholar]
- Lee, S.-Y.; Ku, M.-Y.; Pan, S.-Y.; Lin, C.-C. Reconfigurable and Scalable Artificial Intelligence Acceleration Hardware Architecture With RISC-V CNN Coprocessor for Real-Time Seizure Detection. IEEE Access 2025, 13, 31057–31068. [Google Scholar] [CrossRef]
- Song, Y.-S.; Lee, K.-Y. A Design of Lightweight Convolutional Neural Network Accelerator for IoT Devices. In Proceedings of the 2023 14th International Conference on Ubiquitous and Future Networks (ICUFN), Paris, France, 4–7 July 2023; pp. 474–477. [Google Scholar] [CrossRef]
- Shen, Y. Accelerating CNN on FPGA: An Implementation of MobileNet on FPGA. Master’s Thesis, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden, 2019. [Google Scholar]
- Kwon, H. Designing CNN Accelerators–Day 1; Lecture Notes; Synergy Lab, Georgia Institute of Technology: Atlanta, GA, USA, 2017; Available online: http://synergy.ece.gatech.edu (accessed on 8 January 2026).
- Xiong, Q.; Liao, C.; Yang, Z.; Gao, W. A method for accelerating YOLO by hybrid computing based on ARM and FPGA. In Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 22–24 December 2021; pp. 1–7. [Google Scholar]
- Kim, M.; Oh, K.; Cho, Y.; Seo, H.; Nguyen, X.T.; Lee, H.J. A low-latency FPGA accelerator for YOLOv3-tiny with flexible layerwise mapping and dataflow. IEEE Trans. Circuits Syst. I 2023, 71, 1158–1171. [Google Scholar] [CrossRef]
- Tsai, T.H.; Tung, N.C.; Chen, C.Y. An FPGA-based reconfigurable convolutional neural network accelerator for tiny YOLO-V3. Circuits Syst. Signal Process. 2025, 44, 3388–3409. [Google Scholar] [CrossRef]
- Ahmad, A.; Muhammad, A.P.; Ghulam, J.R. Accelerating tiny YOLOv3 using FPGA-based hardware/software co-design. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Virtual, 10–21 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Amidie, M.A.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Wiranata, A.; Wibowo, S.A.; Patmasari, R.; Rahmania, R.; Mayasari, R. Investigation of Padding Schemes for Faster R-CNN on Vehicle Detection. In Proceedings of the 2018 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), Bandung, Indonesia, 5–7 December 2018; pp. 208–212. [Google Scholar]
- Xie, J.; Long, Z.; Song, Q.; Liu, Z.; Du, Y.; Wang, T. Visible-Light Insulator Defect Detection Based on Improved YOLOv3. In Proceedings of the 2023 3rd International Conference on Electrical Engineering and Mechatronics Technology (ICEEMT), Kunming, China, 21–23 July 2023; pp. 287–292. [Google Scholar]
- Varma, P. Hands-on Guide to Implement Batch Normalization in Deep Learning Models. Analytics India Magazine, 25 July 2020. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Chakure, A. All You Need to Know About YOLO v3 (You Only Look Once). DEV, 1 March 2021. Available online: https://dev.to/afrozchakure/all-you-need-to-know-about-yolo-v3-you-only-look-once-e4m (accessed on 24 January 2026).
- Qiu, J.; Wang, J.; Yao, S.; Guo, K.; Li, B.; Zhou, E.; Yu, J.; Tang, T.; Xu, N.; Song, S.; et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 26–35. [Google Scholar]
- Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 22–24 February 2015; pp. 161–170. [Google Scholar]
- Sulaiman, N.; Saad, N.; Yusof, R. High Speed Booth Encoder Multiplier Design for FPGA Implementation. In Proceedings of the 2010 International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 11–12 May 2010; pp. 1–6. [Google Scholar]
- Guem, D.-H.; Kim, S. Variable Precision Multiplier for CNN Accelerators Based on Booth Algorithm. Int. J. Adv. Sci. Eng. Inf. Technol. (IJASEIT) 2023, 13, 1025–1030. [Google Scholar] [CrossRef]
- Kyriakos, A.; Papatheofanous, E.A.; Bezaitis, C.; Reisis, D. Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification. J. Imaging 2022, 8, 114. [Google Scholar] [CrossRef] [PubMed]
- Lahari, P.L.; Poola, R.G.; Yellampalli, S.S. High Speed and Area Efficient FPGA Implementation of CNN Accelerator for Biomedical Applications. Res. Sq. 2023; preprint. [Google Scholar] [CrossRef]










| Design | Hardware Level | Area (μm2) | Power | PEs | FPS | Clock Frequency | GMACs | GOPS/W |
|---|---|---|---|---|---|---|---|---|
| CNN accelerator—Cyclic array [27] | FPGA | 29,176 | 18.52 | 8 | 3660 | 367 | 2.94 | 326 |
| Vector-wise CNN accelerator [28] | FPGA | 46,172 | 15.98 | 16 | 4300 | 215 | 3.45 | 431 |
| Energy efficient CNN accelerator [6] | FPGA | - | 9.43 | 8 | 4510 | 452 | 3.62 | 767 |
| Proposed CNN accelerator | 45 nm ASIC | 20,375 | 8.69 | 16 | 4982 | 250 | 4 | 920 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, S.; Kunapareddy, M.; Wang, N. Design of a Hardware-Optimized High-Performance CNN Accelerator for Real-Time Object Detection Using YOLOv3 with Darknet-19 Architecture. Electronics 2026, 15, 1264. https://doi.org/10.3390/electronics15061264
Wu S, Kunapareddy M, Wang N. Design of a Hardware-Optimized High-Performance CNN Accelerator for Real-Time Object Detection Using YOLOv3 with Darknet-19 Architecture. Electronics. 2026; 15(6):1264. https://doi.org/10.3390/electronics15061264
Chicago/Turabian StyleWu, Shuo, Manasa Kunapareddy, and Nan Wang. 2026. "Design of a Hardware-Optimized High-Performance CNN Accelerator for Real-Time Object Detection Using YOLOv3 with Darknet-19 Architecture" Electronics 15, no. 6: 1264. https://doi.org/10.3390/electronics15061264
APA StyleWu, S., Kunapareddy, M., & Wang, N. (2026). Design of a Hardware-Optimized High-Performance CNN Accelerator for Real-Time Object Detection Using YOLOv3 with Darknet-19 Architecture. Electronics, 15(6), 1264. https://doi.org/10.3390/electronics15061264

