You are currently on the new version of our website. Access the old version .
ElectronicsElectronics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

11 January 2026

Design and Implementation of a Prefetcher in a Key Performance Subsystems of RISC-V Processors

,
,
and
1
School of Electronic Science and Engineering, Nanjing University, Nanjing 210008, China
2
Jiangsu Huachuang Microsystem Co., Ltd., Nanjing 210032, China
*
Author to whom correspondence should be addressed.
Electronics2026, 15(2), 319;https://doi.org/10.3390/electronics15020319 
(registering DOI)
This article belongs to the Section Computer Science & Engineering

Abstract

The prefetcher is one of the key performance subsystems in RISC-V processors, and its design can significantly enhance memory access efficiency, reduce latency, and improve overall processor performance. This paper conducts in-depth research on the design methods of the prefetcher for RISC-V processors and proposes a practical prefetcher implementation scheme that balances performance and usability. The hybrid prefetching technology proposed in this scheme, on the basis of integrating two classic modes, automatic hardware prefetching and software-prefetch instructions, introduces a software template prefetcher and elaborates on its specific implementation logic in detail. For the hardware prefetcher, this paper further proposes a hierarchical prefetching strategy based on the cache hierarchical architecture and clarifies the design methods of the prefetcher corresponding to each level of cache. This design balances prediction accuracy, performance, power consumption, and design complexity. It employs different prefetching strategies and algorithms to achieve efficient memory access, thus boosting the processor’s overall performance. Both the processor and the prefetcher are designed using Verilog HDL and the implementation and verification are completed on the FPGA prototype verification platform, while the design and implementation of the 12 nm processor chip are carried out. The resulting processor core occupies an area of 5.128 mm2. Performance comparison between the processor equipped with this prefetcher and Xuantie C908 and Xuantie C910 shows that on the FPGA platform, the performance of this processor is improved by 25% to 35.8% compared with the comparison objects. In addition, when the processor with the prefetcher enabled is compared with that with the prefetcher disabled, it is shown that the processor performance can be improved by 25.67% to 61%.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.