Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors
Abstract
1. Introduction
- Optimized the design of an open-source embedded RISC-V processor, achieving significant performance gains that validate the effectiveness of the optimization methodology.
- Quantitatively evaluated each optimization technique using CoreMark benchmarking, providing a clear demonstration of the contribution level of each optimization measure.
2. Related Work
3. Processor Microarchitecture and Division Optimization
3.1. Hardware-Based Five-Stage Pipeline RISC-V Processor
3.2. Introduction to the SHriscv Processor
3.3. Processor Performance Optimization
3.3.1. Dynamic Branch Predictor Optimization
- (1)
- Dynamic Branch Prediction Based on 2-Bit Saturated Counters
- (2)
- Jump Address Prediction
- (3)
- Branch Prediction Correction Module (Set Module)
3.3.2. Division Optimization Based on Data Correlation
- (1)
- Divider Module
| 0012: | addi | x1, x2, 12 |
| 0016: | addi | x3, x1, 12 |
| 0020: | div | x7, x8, x9 |
| 0024: | add | x4, x3, x1 |
3.3.3. Storage Structure Optimization
4. Processor Verification and Analysis
4.1. Hardware and Software Platform
4.2. Functional Testing
4.3. Performance Testing
4.4. Analysis of Experimental Results
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hennessy, J.L.; Patterson, D.A. Computer Architecture: A Quantitative Approach, 6th ed.; China Machine Press: Beijing, China, 2017; pp. 45–50. [Google Scholar]
- Waterman, A.; Asanović, K. RISC-V Instruction Set Manual, Volume 1: User-Level ISA; RISC-V Foundation: Zürich, Switzerland, 2019; pp. 12–18. [Google Scholar]
- Lai, J.; Wei, H.; Sun, K.; Wang, Y. A Neural Network Acceleration System Based on PYNQ. Electron. Des. Eng. 2024, 32, 16–21. [Google Scholar]
- Di, L.; Rui, C.; Ying, L. Research on Application Porting Process Platform from X86 Architecture to ARM Architecture. Electron. Des. Eng. 2022, 30, 176–179+184. [Google Scholar]
- Asanović, K.; Avizienis, R.; Bachrach, J.; Beamer, S.; Biancolin, D.; Celio, C.; Cook, H.; Dabbelt, D.; Hauser, J.; Izraelevitz, A.; et al. Rocket Chip Generator; Berkeley Technical Report; University of California: Berkeley, CA, USA, 2016. [Google Scholar]
- Celio, C.; Patterson, D.A.; Asanović, K. BOOM Processor: A Multicore Research Platform; Berkeley Technical Report; University of California: Berkeley, CA, USA, 2015. [Google Scholar]
- Taheri, F.; Bayat-Sarmadi, S.; Hadayeghparast, S. RISC-HD: Lightweight RISC-V processor for efficient hyperdimensional computing inference. IEEE Internet Things J. 2022, 9, 24030–24037. [Google Scholar] [CrossRef]
- Li, T.; Cui, E.; Wu, Y.; Wei, Q.; Gao, Y. TeleVM: A lightweight virtual machine for RISC-V architecture. IEEE Comput. Archit. Lett. 2024, 23, 121–124. [Google Scholar] [CrossRef]
- Sun, Y. RISC-V Advances Rapidly, Promising Future Prospects. Commun. World 2024, 17, 5. [Google Scholar]
- Liu, X.; Lin, H.; Liu, P. Overview of the RISC-V Instruction Set Architecture and Its Applications. China Integr. Circuits 2025, 34, 16–20+49. [Google Scholar]
- Zhang, X. Academician Ni Guangnan of the Chinese Academy of Engineering: Contributing Chinese Wisdom to the Prosperity of the RISC-V Ecosystem. China Electronics News, 29 November 2024; pp. 2–9. [Google Scholar]
- Patterson, D.; Waterman, A. The Art of RISC-V Open Architecture Design; Electronics Industry Press: Beijing, China, 2024; pp. 230–245. [Google Scholar]
- Deng, Y. Leading the Chip Customization Revolution—SiFive 2018 Shanghai Technical Seminar Successfully Held. China Integr. Circuits 2018, 27, 17–18+26. [Google Scholar]
- Silei, L.; Zheng, X. Research on Open-source SoC Freedom E310 Debugging Method. Microcontroll. Embed. Syst. 2017, 17, 59–62. [Google Scholar]
- Xu, L. “One Life, One Chip” Initiative: Tackling the Bottleneck in Chip Talent Development. Guangming Daily, 28 November 2021; p. 6. [Google Scholar]
- Wu, Y.; Qiao, J.; Lei, G.; Su, Y. Research and Design of RISC-V Processors for Edge Nodes. Electron. Devices 2024, 47, 1451–1456. [Google Scholar]
- Lin, O. Design of a RISC-V Processor for Power Supply Controllers. Master’s Thesis, University of Chinese Academy of Sciences (Institute of Modern Physics, Chinese Academy of Sciences), Beijing, China, 2024; pp. 1–5. [Google Scholar]
- Zhang, X.; Liang, Q.; Li, T. Design and Implementation of an RV32I Control Unit. Microelectron. Comput. 2018, 35, 74–78+82. [Google Scholar]
- Li, Y.; Jiao, J.; Liu, Y.; Hao, Z. Research and Design of an Embedded RISC-V Out-of-Order Execution Processor. Comput. Eng. 2021, 47, 261–267+284. [Google Scholar]
- Jie, G. Research on Embedded RISC-V Microprocessor Architecture. Master’s Thesis, Central South University, Changsha, China, 2023; pp. 1–6. [Google Scholar]
- Li, D.; Cao, K.; Qu, M.; Wang, F. Hardware/Software Co-Emulation Verification of a Five-Stage Pipeline RISC-V Processor. J. Jilin Univ. (Inf. Sci. Ed.) 2017, 35, 612–616. [Google Scholar] [CrossRef]
- Li, Q. Design and Analysis of a RISC-V Microcontroller for IoT End Devices. Microcontroll. Embed. Syst. Appl. 2018, 18, 64–66. [Google Scholar]
- Patterson, D.A.; Hennessy, J.L. Computer Organization and Design, RISC-V Edition: The Hardware/Software Interface; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2017. [Google Scholar]
- Mith, A. Cache Memory. ACM Comput. Surv. 2018, 50, 1–35. [Google Scholar]
- Yeh, T.-Y.; Patt, Y.N. Two-Level Adaptive Training Branch Prediction. J. Comput. Archit. 2019, 26, 45–52. [Google Scholar]
- Li, R.; Chen, J.; Liu, W. Architecture Optimization Based on the Xinlai Hummingbird E203 Processor. Electron. Des. Eng. 2025, 33, 6–11+16. [Google Scholar]
- Zhou, X.; Cai, G.; Huang, Z. Design and Implementation of RISC-V Extended Instruction Set Supporting FPGA Dynamic Reconfiguration. Comput. Eng. 2025, 51, 229–238. [Google Scholar]
- Xie, H.; Xiao, Q.; Zhu, Z.; Liu, Y.; Liu, Y. Development and Apssplication of Vector Instruction Set and Communication Extension Instruction Set Based on RISC-V Architecture in 5G Redcap Baseband Processor. China Informatiz. 2024, 1, 89–90. [Google Scholar]
- Simin, J. Research on RISC-V Processor Core Design Optimization and Extended Instruction Set Implementation. Master’s Thesis, Shandong University, Jinan, China, 2023; pp. 19–26. [Google Scholar]
- Wen, G. Analysis and Optimization of the Branch Prediction Unit in SweRV EH1. Master’s Thesis, Xiamen University, Xiamen, China, 2022; pp. 10–15. [Google Scholar]
- Kim, H.K.; Kim, H.S.; Eun, C.M.; Cho, H.H.; Jeong, O.H. A high-performance branch predictor design considering memory capacity limitations. In Proceedings of the 2017 International Conference on Circuits, System and Simulation (ICCSS), London, UK, 14–17 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 49–53. [Google Scholar]
- Zhang, Y. Design of a High-Performance Embedded Processor Supporting the RV64IMCB Instruction Set. Master’s Thesis, Xiamen University, Xiamen, China, 2022; pp. 80–90. [Google Scholar]
- Wei, Y.; Yang, Z.; Tie, J.; Shi, W.; Zhou, L.; Wang, Y.; Wang, L.; Wu, X. A Multistage Dynamic Branch Predictor Based on Hummingbird E203. Comput. Eng. Sci. 2024, 46, 785–793. [Google Scholar]
- Martin, T. The Designer’s Guide to the Cortex-M Processor Family; Newnes: Boston, MA, USA, 2022. [Google Scholar]














| Module Name | Description |
|---|---|
| PC | Calculates the Program Counter value. |
| If | Fetch the instructions. |
| Id | Decodes the instructions. |
| Ex | Performs data operations and instruction execution. |
| Mem | Accessing memory modules. |
| Wb | Write-back module. |
| Ctrl | Responsible for controlling the pause and resumption of the pipeline. |
| Div | Performs division operations. |
| Jtag | Used for circuit testing and debugging. |
| Bht | Predicting the jump direction of branch instructions. |
| Btb | Predict the jump address of branch instructions. |
| Set | Dynamic branch prediction error correction unit, correcting prediction errors. |
| Icache | Instruction cache module. |
| Dcache | Data cache module. |
| Hazzare_detect | Conflict detection module. |
| Clint | Arbitrates interrupt signals. |
| Regs | Stores general-purpose register data. |
| Bus | Connects the CPU to peripheral modules. |
| Rom | Stores the instruction information corresponding to the software. |
| Ram | Stores the data information corresponding to the software. |
| Timer | Performs timing and generates timer interrupts. |
| Gpio | Provides basic input/output functionality. |
| Spi | Connects SPI interface devices. |
| Uart | Executes serial data transmission and reception. |
| Index Name | Corresponding Value | Gselect Algorithm PHT Index Value | TXOR-Gselect Algorithm PHT Index Value |
|---|---|---|---|
| Example 1: PC | 1010001111001111 | 1010001110101010 | 0000000000001110 |
| Example 1: GHR | 0011110010101010 | ||
| Example 2: PC | 1010001100000001 | 1010001110101010 | 0000000000000000 |
| Example 2: GHR | 1010111110101010 |
| Signal Name | Input/ Output | Description |
|---|---|---|
| predict_jump_flag_i | Input | Predicted branch jump flag, 1 indicates a jump, 0 indicates no jump. |
| predict_jump_addr_i | Input | Predicted branch jump target address. |
| act_jump_flag_i | Input | Actual branch jump flag, 1 indicates a jump, 0 indicates no jump. |
| act_jump_addr_i | Input | Actual branch jump target address. |
| jump_flag_o | Output | Branch prediction correction signal. |
| jump_addr_o | Output | Correct jump target address. |
| Signal Name | Input/ Output | Description |
|---|---|---|
| reg_raddr1_i | Input | The register address of source operand 1 that needs to be read at present. |
| ex_reg_waddr_i | Input | Execution phase destination register address. |
| div_waddr_reg | Input | The instructions following the division instruction. |
| div_start_i | Input | Division operation start flag. |
| div_end_i | Input | Division operation end flag. |
| ex_reg_wdata_i | Input | Data to be written into registers during the execution phase. |
| mem_reg_wdata_i | Input | Data to be written to registers during the memory access phase. |
| wb_reg_wdata_i | Input | Data to be written into the register during the write-back phase. |
| reg_rdata1_i | Input | The original data is directly read from the read port of the RS1 register. |
| div_data_i | Input | Division operation result. |
| ex_ins_lw_i | Input | A memory access instruction (load and store instructions). |
| ex_reg_we_i | Input | Register write enable signal during the execution phase. |
| mem_reg_waddr_i | Input | Destination register address during the memory access phase. |
| wb_reg_waddr_i | Input | Write-back stage destination register address. |
| hold_flag1_o | Output | Pipeline pause sign. |
| flush_flag1_o | Output | Pipeline refresh sign. |
| reg_rdata1_o | Output | Data transferred to register RS1. |
| Icache (Y/N) | Dcache(Y/N) | Division (Optimization) | Dynamic Forecasting(Y/N) | CoreMark/MHz |
|---|---|---|---|---|
| × | × | × | × | 2.4 |
| × | √ | × | × | 2.6 |
| √ | √ | × | × | 2.81 |
| √ | √ | √ | × | 2.81 |
| √ | √ | √ | √ | 2.92 |
| Processor | Bit Width | Architecture | Process/nm | Power Consumption/ mW | Number of Logic Gates | CoreMark/MHz |
|---|---|---|---|---|---|---|
| E203 [33] | 32 | RISC-V | SMIC 110 | 0.145 | 18.5 k | 2.14 |
| Cortex-M0 [34] | 16 | ARMv6-M | SMIC 110 | 0.17 | 14.7 k | 2.33 |
| Cortex-M3 [34] | 32 | Armv7-M | SMIC 110 | 0.75 | 48 k | 3.32 |
| This processor | 32 | RISC-V | SMIC 110 | 0.43 | 32.8 k | 2.92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, Z.; Hu, T.; Jie, Z.; Wang, P. Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors. Appl. Sci. 2025, 15, 11634. https://doi.org/10.3390/app152111634
Jin Z, Hu T, Jie Z, Wang P. Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors. Applied Sciences. 2025; 15(21):11634. https://doi.org/10.3390/app152111634
Chicago/Turabian StyleJin, Zhiwei, Tingpeng Hu, Zhiyi Jie, and Peng Wang. 2025. "Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors" Applied Sciences 15, no. 21: 11634. https://doi.org/10.3390/app152111634
APA StyleJin, Z., Hu, T., Jie, Z., & Wang, P. (2025). Research and Implementation of Performance Optimization Methods for RISC-V Level-5 Processors. Applied Sciences, 15(21), 11634. https://doi.org/10.3390/app152111634
