Next Article in Journal
Enhancing User Experience in Virtual Reality Through Optical Flow Simplification with the Help of Physiological Measurements: Pilot Study
Previous Article in Journal
Hybrid Underwater Image Enhancement via Dual Transmission Optimization and Transformer-Based Feature Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array

School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2026, 26(2), 628; https://doi.org/10.3390/s26020628 (registering DOI)
Submission received: 20 December 2025 / Revised: 13 January 2026 / Accepted: 15 January 2026 / Published: 16 January 2026
(This article belongs to the Section Intelligent Sensors)

Abstract

Convolutional Neural Networks (CNNs) generally suffer from excessive computational overhead, high resource consumption, and complex network structures, which severely restrict the deployment on microprocessor chips. Existing related accelerators only have an energy efficiency ratio of 2.32–6.5925 GOPs/W, making it difficult to meet the low-power requirements of embedded application scenarios. To address these issues, this paper proposes a low-power and high-energy-efficiency CNN accelerator architecture based on a central processing unit (CPU) and an Application-Specific Integrated Circuit (ASIC) heterogeneous computing architecture, adopting an operator-fused systolic array algorithm with the YOLOv5n target detection network as the application benchmark. It integrates a 2D systolic array with Conv-BN fusion technology to achieve deep operator fusion of convolution, batch normalization and activation functions; optimizes the RISC-V core to reduce resource usage; and adopts a locking mechanism and a prefetching strategy for the asynchronous platform to ensure operational stability. Experiments on the Nexys Video development board show that the architecture achieves 20.6 GFLOPs of computational performance, 1.96 W of power consumption, and 10.46 GOPs/W of energy efficiency ratio, which is 58–350% higher than existing mainstream accelerators, thus demonstrating excellent potential for embedded deployment.
Keywords: convolutional neural network; hardware accelerator; systolic array; heterogeneous computing; operator fusion convolutional neural network; hardware accelerator; systolic array; heterogeneous computing; operator fusion

Share and Cite

MDPI and ACS Style

Zong, Y.; Ma, Z.; Ren, J.; Cao, Y.; Li, M.; Liu, B. Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array. Sensors 2026, 26, 628. https://doi.org/10.3390/s26020628

AMA Style

Zong Y, Ma Z, Ren J, Cao Y, Li M, Liu B. Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array. Sensors. 2026; 26(2):628. https://doi.org/10.3390/s26020628

Chicago/Turabian Style

Zong, Yang, Zhenhao Ma, Jian Ren, Yu Cao, Meng Li, and Bin Liu. 2026. "Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array" Sensors 26, no. 2: 628. https://doi.org/10.3390/s26020628

APA Style

Zong, Y., Ma, Z., Ren, J., Cao, Y., Li, M., & Liu, B. (2026). Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array. Sensors, 26(2), 628. https://doi.org/10.3390/s26020628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop