# RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. RISC-V VP Based CNN DLA

#### 3.1. SystemC-Based RISC-V VP

#### 3.2. CNN DLA Overview

#### 3.3. GFSR Register Set in DLA

#### 3.4. Data Loader Module and Buffer

#### 3.5. CPIPE and APIPE Module

#### 3.6. DNN Applications on the RISC-V DLA System

#### 3.7. Extention Issues

## 4. Verification and Analysis with Experiments

#### 4.1. Darknet Running on DLA with RISC-V VP

#### 4.2. Buffer Effects in DLA System

#### 4.3. Architecture Parallelism

#### 4.4. Quantization Effect

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

CNN | Convolutional Neural Network |

ESL | Electronic System Level |

DLA | Deep Learning Accelerator |

DNN | Deep Neural Network |

AI | Artificial Intelligence |

IoT | Internet of Thing |

ISA | Instruction Set Architecture |

VP | Virtual Platform |

RTL | Register Translation Level |

VHDL | VHSIC Hardware Description Language |

RNN | Recurrent Neural Network |

FGPA | Field-Programmable Gate Array |

GP | General purpose Processor |

YOLO | You only look once |

TLM | Transaction-Level Modeling |

GFSR | Global Funcgion Set Register |

CPIPE | Convolution PIPE |

APIPE | Activation-Pooling PIPE |

PE | Processing Element |

## References

- Yang, Q.; Luo, X.; Li, P.; Miyazaki, T.; Wang, X. Computation offloading for fast CNN inference in edge computing. In Proceedings of the ACM Conference on Research in Adaptive and Convergent Systems (RACS’19), Chongqing, China, 24–27 September 2019; pp. 101–106. [Google Scholar]
- Véstias, M.P. A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing. Algorithms
**2019**, 12, 154. [Google Scholar] [CrossRef] [Green Version] - Zhang, Q.; Zhang, M.; Chen, T.; Sun, Z.; Ma, Y.; Yu, B. Recent advances in convolutional neural network acceleration. Neurocomputing
**2019**, 323, 37–51. [Google Scholar] [CrossRef] [Green Version] - Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv
**2017**, arXiv:1704.04861. [Google Scholar] - Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Darknet. Available online: https://pjreddie.com/darknet/ (accessed on 1 July 2020).
- Marchisio, A.; Hanif, M.A.; Khalid, F.; Plastiras, G.; Kyrkou, C.; Theocharides, T.; Shafique, M. Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges. In Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, USA, 15–17 July 2019; pp. 553–559. [Google Scholar]
- Chen, Y.; Xie, Y.; Song, L.; Chen, F.; Tang, T. A Survey of Accelerator Architectures for Deep Neural Networks. Engineering
**2020**, 6, 264–274. [Google Scholar] [CrossRef] - Migacz, S. 8-bit Inference with TensorRT. In Proceedings of the NVIDIA GPU Technology Conference, Silicon Valley, CA, USA, 8–11 May 2017. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar]
- Jain, S.; Venkataramani, S.; Srinivasan, V.; Choi, J.; Chuang, P.; Chang, L. Compensated-DNN: Energy efficient low-precision deep neural networks by compensating quantization errors. In Proceedings of the 55th ACM/ESDA/IEEE Design Automation Conference, San Francisco, CA, USA, 24–28 June 2018. [Google Scholar]
- Shawahna, A.; Sait, S.M.; El-Maleh, A. FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review. IEEE Access
**2019**, 7, 7823–7859. [Google Scholar] [CrossRef] - Chen, Y.; Yang, T.; Emer, J.; Sze, V. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst.
**2019**, 9, 292–308. [Google Scholar] [CrossRef] [Green Version] - NVIDA. NVIDIA Deep Learning Accelerator. Available online: https://nvdla.org (accessed on 1 July 2020).
- Waterman, A.; Asanović, K. The RISC-V Instruction Set Manual; Volume I: User-Level ISA. CS Division, EECS Department, University of California: Berkeley, CA, USA, 2017. [Google Scholar]
- Waterman, A.; Asanović, K. The RISC-V Instruction Set Manual; Volume II: Privileged Architecture; CS Division, EECS Department, University of California: Berkeley, CA, USA, 2017. [Google Scholar]
- Herdt, V.; Große, D.; Le, H.M.; Drechsler, R. Extensible and Configurable RISC-V Based Virtual Prototype. In Proceedings of the 2018 Forum on Specification and Design Languages (FDL), Garching, Germany, 10–12 September 2018; pp. 5–16. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv
**2018**, arXiv:1804.02767. [Google Scholar] - Aledo, D.; Schafer, B.C.; Moreno, F. VHDL vs. SystemC: Design of Highly Parameterizable Artificial Neural Networks. IEICE Trans. Inf. Syst.
**2019**, E102.D, 512–521. [Google Scholar] [CrossRef] [Green Version] - Abdelouahab, K.; Pelcat, M.; Sérot, J.; Berry, F. Accelerating CNN inference on FPGAs: A Survey. arXiv
**2018**, arXiv:1806.01683. [Google Scholar] - Shin, D.; Lee, J.; Lee, J.; Yoo, H. 14.2 DNPU An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In Proceedings of the 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 5–9 February 2017; pp. 240–241. [Google Scholar]
- Flex Logic Technologies, Inc. Flex Logic Improves Deep Learning Performance by 10X with New EFLX4K AI eFPGA Core; Flex Logix Technologies, Inc.: Mountain View, CA, USA, 2018. [Google Scholar]
- Fujii, T.; Toi, T.; Tanaka, T.; Togawa, K.; Kitaoka, T.; Nishino, K.; Nakamura, N.; Nakahara, H.; Motomura, M. New Generation Dynamically Reconfigurable Processor Technology for Accelerating Embedded AI Applications. In Proceedings of the 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA, 18–22 June 2018; pp. 41–42. [Google Scholar]
- Wang, Y.; Xu, J.; Han, Y.; Li, H.; Li, X. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family. In Proceedings of the 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 5–9 June 2016. [Google Scholar]
- Gokhale, V.; Jin, J.; Dundar, A.; Martini, B.; Culurciello, E. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 696–701. [Google Scholar]
- Parashar, A.; Raina, P.; Shao, Y.S.; Chen, Y.-H.; Ying, V.A.; Mukkara, A.; Venkatesan, R.; Khailany, B.; Keckler, S.W.; Emer, J. Timeloop: A systematic approach to dnn accelerator evaluation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, WI, USA, 24–26 March 2019; pp. 304–315. [Google Scholar]
- Samajdar, A.; Joseph, J.M.; Zhu, Y.; Whatmough, P.; Mattina, M.; Krishna, T. A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim. In Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, USA, 23–25 August 2020; pp. 58–68. [Google Scholar]
- Kwon, H.; Chatarasi, P.; Sarkar, V.; Krishna, T.; Pellauer, M.; Parashar, A. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings. IEEE Micro
**2020**, 40, 20–29. [Google Scholar] [CrossRef] - Russo, E.; Palesi, M.; Monteleone, S.; Patti, D.; Ascia, G.; Catania, V. LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation. In Proceedings of the 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Kassel, Germany, 22–26 March 2021; pp. 161–166. [Google Scholar]
- Wu, N.; Jiang, T.; Zhang, L.; Zhou, F.; Ge, F. A Reconfigurable Convolutional Neural Network-Accelerated Coprocessor Based on RISC-V Instruction Set. Electronics
**2020**, 9, 1005. [Google Scholar] [CrossRef] - Li, Z.; Hu, W.; Chen, S. Design and Implementation of CNN Custom Processor Based on RISC-V Architecture. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications, IEEE 17th International Conference on Smart City, IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Zhangjiajie, China, 10–12 August 2019; pp. 1945–1950. [Google Scholar]
- Porter, R.; Morgan, S.; Biglari-Abhari, M. Extending a Soft-Core RISC-V Processor to Accelerate CNN Inference. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 694–697. [Google Scholar]
- Zhang, G.; Zhao, K.; Wu, B.; Sun, Y.; Sun, L.; Liang, F. A RISC-V based hardware accelerator designed for Yolo object detection system. In Proceedings of the 2019 IEEE International Conference of Intelligent Applied Systems on Engineering (ICIASE), Fuzhou, China, 26–29 April 2019. [Google Scholar]
- Venkatesan, R.; Shao, Y.S.; Zimmer, B.; Clemons, J.; Fojtik, M.; Jiang, N.; Keller, B.; Klinefelter, A.; Pinckney, N.; Raina, P.; et al. A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology. In Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), Cupertino, CA, USA, 18-20 August 2019; IEEE Computer Society: Washington, DC, USA, 2019. [Google Scholar]
- Feng, S.; Wu, J.; Zhou, S.; Li, R. The Implementation of LeNet-5 with NVDLA on RISC-V SoC. In Proceedings of the 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 18–20 October 2019. [Google Scholar]
- Giri, D.; Chiu, K.-L.; Eichler, G.; Mantovani, P.; Chandramoorthy, N.; Carloni, L.P. Ariane+ NVDLA: Seamless third-party IP integration with ESP. In Proceedings of the Workshop on Computer Architecture Research with RISC-V (CARRV), Valencia, Spain, 29 May 2020. [Google Scholar]
- Bailey, B.; Martin, G.; Piziali, A. ESL Design and Verification: A Prescription for Electronic System Level Methodology; Morgan Kaufmann/Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
- Lee, Y.; Hsu, T.; Chen, C.; Liou, J.; Lu, J. NNSim: A Fast and Accurate SystemC/TLM Simulator for Deep Convolutional Neural Network Accelerators. In Proceedings of the 2019 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, 22–25 April 2019; pp. 1–4. [Google Scholar]
- Kim, S.; Wang, J.; Seo, Y.; Lee, S.; Park, Y.; Park, S.; Park, C.S. Transaction-level Model Simulator for Communication-Limited Accelerators. arXiv
**2020**, arXiv:2007.14897. [Google Scholar] - Vece, G.B.; Conti, M. Power estimation in embedded systems within a systemc-based design context: The pktool environment. In Proceedings of the 2009 Seventh Workshop on Intelligent solutions in Embedded Systems, Ancona, Italy, 25–26 June 2009; pp. 179–184. [Google Scholar]
- Greaves, D.; Yasin, M. TLM POWER3: Power estimation methodology for SystemC TLM 2.0. Proceeding of the 2012 Forum on Specification and Design Languages, Vienna, Austria, 18–20 September 2012; pp. 106–111. [Google Scholar]
- Nabavinejad, S.M.; Baharloo, M.; Chen, K.-C.; Palesi, M.; Kogel, T.; Ebrahimi, M. An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators. IEEE J. Emerg. Sel. Top. Circuits Syst.
**2020**, 10, 268–282. [Google Scholar] [CrossRef] - Cosine Similarity. Available online: https://en.wikipedia.org/wiki/Cosine_similarity (accessed on 1 December 2020).

**Figure 3.**The memory-mapped DLA Registers and its access from software application in RISC-V VP platform.

**Figure 6.**Experimental results of running on RISC-V VP DLA system and running of Original Darknet library for layer 0, 1 and 13 of Darknet Yolo tiny v3 Nueral Network Model, which are convolution of 416 × 416 image with 16 filters (

**a**,

**b**), max pooling of 16 416 × 416 images (

**c**,

**d**), and convolution of 13 × 13 image with 1024 filters (

**e**,

**f**), respectively.

**Figure 7.**Comparison of each pixel values between RISC-V VP DLA system and Original Darknet Model running for layer 0, 1 and 13 of Darknet Yolo tiny v3 Nueral Network Model, which are convolution of 416 × 416 image with 16 filters (

**a**), max pooling of 16 416 × 416 images (

**b**), and convolution of 13 × 13 image with 1024 filters (

**c**), respectively. Blue points represent pixels for original Darknet and orange points represent pixels for RISC-V VP DLA system.

**Figure 8.**It plots access frequency of each buffer in developed DLA according to the change in buffer size from 1 MB to 8 MB for each CNN configuration.

**Figure 9.**It plots the amount of data transferred between memory and each buffer according to the the buffer size from 1 MB to 8 MB for each CNN configuration.

**Figure 10.**It plots simulated execution times of each submodule in DLA for CNN running with inputs 256, 512, 1024 and outputs 32, 128, 245 for 13 × 13 images and 1 × 1 filter sizes.

**Figure 11.**It shows the simulated excution time in each sub-module by performing convolution and maxpooling operations of 3 NN operations according to the precision level with 1 M and 8 M buffers, respectively; (

**a**) input3/output16, (

**b**) input9/output8, (

**c**) input15/output4.

**Figure 12.**It plots the ratio of actual data transfer to the amount of data required for NN operations according to the precision level with 1 M and 8 M buffers, respectively; (

**a**) input3/output (4, 8, 16) and (

**b**) input9/output (4, 8, 16).

Approach | List | Features |
---|---|---|

RTL with FPGA | Y. Chen [13] D. Shin [21] Flex [22] T. Fujii [23] | course-grain reconfigurable both CNN and RNN reconfigurable IP dynamic reconfigurable |

Coprocessor with GP or RISC-V | V. Gokhale [25] N. Wu [30] Z. Li [31] R. Porter [32] G Zhang [33] | interface with general processor and external memory core-interconnected accelerator support custom instruction for convolution extension for in-pipeline hardware customized for Yolo |

Framework | Timeloop [26] ScaleSim [27] MAESTRO [28] LAMBDA [29] | inference performance and energy cycle-accurate and analytical model for scaling various data form for generating statistics support modeling for communication and memory sub-system |

ESL with SystemC | Y. Lee [38] S. Kim [39] S. Lim (This Paper) | raised abstraction level cycle-accurate-based support RISC-V VP and software interface |

(unit: ns) | Loader Module | ||||
---|---|---|---|---|---|

Memory Delay per Byte | Router | Data | Requester | ||

5 | 11 | 43 | 10 | ||

CPIPE | APIPE | ||||

con2CPIPE | 4 PEs | CPIPEDone | con2APIPE | 4 PEs | APIPEDone |

555 | 1114 | 67 | 208 | 399 | 100 |

Wall-Clock Simulation Time for Various NN Workloads | |||||||||
---|---|---|---|---|---|---|---|---|---|

(# of input, # of output) | (3, 4) | (3, 8) | (3, 16) | (9, 4) | (9, 8) | (9, 16) | (15, 4) | (15, 8) | (15, 16) |

wall-clock time (s) | 7.62 | 14.08 | 27.03 | 18.22 | 34.7 | 65.98 | 29.2 | 55.6 | 107.91 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lim, S.-H.; Suh, W.W.; Kim, J.-Y.; Cho, S.-Y.
RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC. *Electronics* **2021**, *10*, 1514.
https://doi.org/10.3390/electronics10131514

**AMA Style**

Lim S-H, Suh WW, Kim J-Y, Cho S-Y.
RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC. *Electronics*. 2021; 10(13):1514.
https://doi.org/10.3390/electronics10131514

**Chicago/Turabian Style**

Lim, Seung-Ho, WoonSik William Suh, Jin-Young Kim, and Sang-Young Cho.
2021. "RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC" *Electronics* 10, no. 13: 1514.
https://doi.org/10.3390/electronics10131514