You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

22 August 2022

Rapid and Accurate PPA Prediction for the Template-Based Processor Design Methods

,
and
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.

Abstract

The template-based chip design method aims to build rapidly. However, it still need synthesis and simulation flows to get the performance, power, and area (PPA) reports and find the proper parameters set in the design space exploration, which takes a long time. Therefore, a rapid and accurate PPA prediction method is proposed. At first, the PPA Prediction Model based on Multivariate Linear regression (ML-PM) is proposed to fit the multiple parameters’ influence on the PPA via the single parameter affection. Moreover, a Multivariate NonLinear regression Prediction Model (MNL-PM) based on Amdahl’s law is introduced to improve the accuracy of the PPA estimation. The empirical evaluation of the method shows that the PPA prediction for the template-based chip design methods can reach 98.60%, 99.19%, and 98.53% accuracy on performance, power, and, area separately, when compared with the PPA generated via the synthesis and simulation flows.

1. Introduction

Currently, the Dennard scaling law [1] and Moore’s law [2] are gradually coming to an end, and the agile development of hardware has brought new opportunities to people [3]. Currently, the prosperity of Artificial Intelligence and the Internet of Things (AIoT) brings an amount of demand for chips from AIoT devices [4]. Unlike other devices, such as personal computers and smartphones, AIoT devices in different applications require totally different chips that meet the latency, power, and area needed. Moreover, AIoT devices in different applications have various requirements of latency, power, and area, showing that some devices are more sensitive to power than latency and area, while some devices are more sensitive to area. In other words, the demand for chips from AIoT devices varies and is fragmented.
Therefore, the rapid construction of scene-specific systems-on-chips (SoCs) in the emerging AIoT field is promising [5]. With the development of hardware description languages (HDLs), such as Chisel [6] on Scala, ClaSH [7] on Haskell, and PyRTL [8] on Python, engineers can design hardware in a parameterized and reusable way, which can accelerate the construction of custom SoCs. Moreover, there are a few template-based chip design methods that aim to rapidly construct chips by adjusting the circuit modules and the parameters. FabScalar [9] was developed by Niket K et al. for automatically composing synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A canonical pipeline stage library (CPSL) provides many implementations of each canonical pipeline stage that differ in their superscalar width and depth of subpipelining. An RTL generation tool uses a template and CPSL to automatically generate the overall core of a desired configuration. Rocket Chip [10], proposed by Krste et al., is an open-source SoC design generator that emits synthesizable RTL. It leverages Chisel to compose a library of sophisticated generators for cores, caches, and interconnects into an integrated SoC. Rocket Chip generates general-purpose processor cores that use the open RISC-V ISA and provides both an in-order core generator (Rocket) and an out-of-order core generator (BOOM). For SoC designers interested in utilizing heterogeneous specialization for added efficiency gains, Rocket Chip supports the integration of custom accelerators in the form of instruction set extensions, coprocessors, or fully independent novel cores. Moreover, Rocket Chip has been taped out (manufactured) eleven times and yielded functional silicon prototypes capable of booting Linux. Sizhuo Zhang et al. presented a framework called composable modular design (CMD) [11] to facilitate the design of out-of-order processors. In CMD, the interface methods of modules provide instantaneous access and perform atomic updates to the state elements inside the module. Modules are composed together by atomic rules that call interface methods of different modules. The atomicity properties of interfaces in CMD ensure composability when selected modules are selectively refined.
The designs of the methods mentioned are compiled into RTL, which can be run on FPGAs or synthesized using standard ASIC design flows. However, existing chip template-based design methods still use synthesis and simulation to obtain the quality of results (QoR) regarding performance, power, and area (PPA) for parameter set design space exploring (DSE), as shown in Figure 1. However, iteration of the parameter set DSE via synthesis and simulation flows is time-consuming. Based on this, this study proposes a rapid and accurate PPA prediction for template-based chip design methods. First, a preprocess for template-based methods obtains the influence of each parameter of the templates on the PPA. Moreover, a PPA prediction model based on multivariate linear regression (ML-PM) is proposed to fit the multiparameter influence on the PPA via a single parameter effect. However, ML-PM does not consider the laws on processor performance. Therefore, a multivariate nonlinear regression prediction model (MNL-PM) based on Amdahl’s law is introduced to improve the accuracy of the PPA estimation.
Figure 1. The parameter design process.
As shown in Figure 2, the proposed method replaces the synthesis and simulation process to reduce the time cost during the parameter design process. An empirical evaluation of the method shows that the PPA prediction for template-based chip design methods can reach 98.60%, 99.19%, and 98.53% accuracy on performance, power, and area, respectively, when compared with a PPA generated via synthesis and simulation flows. To the best of our knowledge, there is no previous work on fast and accurate PPA prediction on HDL design from template-based processor design methods. We believe that the proposed method is a novel and interesting design point in the space of solutions to the agile chip design problem. The main contributions of this work can be summarized as follows:
Figure 2. The parameter design process with proposed PPA prediction method.
  • The PPA prediction model, ML-PM is proposed to achieve the PPA prediction via the parameter set, in which there are alters for normal multivariate linear regression model.
  • The PPA prediction model, MNL-PM based on Amdahl’s law is introduced as to improve the accuracy of the PPA estimation.
  • The iteration process of the parameters set DSE has been improved with the proposed PPA prediction model, which is time thrifty.
To the best of our knowledge, there is no previous work on fast and accurate PPA prediction on HDL design from template-based processor design methods. We believe that the proposed method is a novel and interesting design point in the space of solution to the agile chip design problem.

3. PPA Prediction Method

The proposed method aims to rapidly estimate accurate PPA based on the parameter set of a certain chip design in the early stage of design, which can accelerate the process of proper parameter set design. The method contains two parts: one is the preprocess for a template-based method, and the other is a prediction model for the PPA estimation of the parameter set. In this study, a template-based method, Rocket Chip, is chosen as an example.

3.1. Parameter and Data Preprocessing

As the proposed method aims to predict the PPA of the parameter set for template-based methods, it is critical to determine the correlation among parameters and the design space of the parameter set. In Rocket Chip, for instance, a single core is mainly dependent on the parameters of modules, including the Core, Cache, BTB, and BTB’s submodule, BHT. First, only one of the parameters with correlations is selected as their representative. In this way, 27 parameters are selected as the parameter set. Second, to determine the design space of the parameter set, the value range of each parameter is designed, as shown partly in Table 1, as follows.
  • For a bool-type parameter, there is only one optional value, and the default value is negated, except for the original value.
  • For an int-type parameter, there are two situations. For the first situation, there are two optional values except the original value of itself. Moreover, these three values form a proportional sequence with a common ratio of 2. As in the real design process, the parameters such as the size of memory are changed at a ratio of 2. Moreover, the default values will be the smallest, the middle, or the largest among these three values depending on the analysis of the existing designs in Rocket Chip. For other situations, the default values of these parameters are 0 and are still 0 in the existing designs in Rocket Chip. Then, 1 is set as another optional value of these parameters, similar to the rule for a bool-type parameter.
Moreover, the proposed method requires the individual influence of each parameter on the PPA to calculate the PPA prediction. Therefore, the bold value in Table 2 is designed as the base parameter set, and each parameter is changed one at a time which are set as changed parameter sets to obtain the individual influence on PPA of each parameter. Specifically, the rules to set the values of the base parameters are that for int-type parameters with 3 optional values, the middle values are chosen, and for int-type parameters with value range, 0 and 1 values and bool-type parameters, the set value depends on the existing designs in Rocket Chip. Moreover, Table 3 shows an instance of changed sets partly. In this way, there are only 42 parameter set samples with PPA needed, and the PPA of the parameter set with the design space of 3.918 × 1010 is predictable.
Table 2. The portion of parameters set with value range.
Table 3. An instance of the same portion of one changed set.

3.2. Prediction Model Derivation

Having the effect of each parameter of the parameter set on PPA is not enough to predict the PPA of a certain parameter set design. Moreover, the relationship between the multifactor influence and a single-factor effect on the same system is a black box. Therefore, multivariate regression is utilized to fit the multiparameter influence on the PPA via a single parameter effect. The PPA estimate of the parameter set can be calculated via this method and the PPA samples of each parameter. In this study, the fit model is built separately based on a weight model and Amdahl’s law.

3.2.1. Multivariate Linear Regression Prediction Model

To build the ML-PM, we assume that each parameter is independent and has three individual weights for performance, power, and area. In detail, for each parameter p i , i ( 1 , n ) , this paper assumes that there are weight w i , w i , and w i , for performance, power, and area specifically, while w i , w i , and w i > 0 . Taking performance ( P 1 ) for instance, P 1 is the performance value of the base parameter set. And P 1 will turn to P 1 , m and fellow equation holding when one parameter p m , m ( 1 , n ) is modified to other value in its value range.
S = w m 1 + β m + i = 1 , i m n w i i = 1 n w i
P 1 , m = P 1 S
In the Equation (1), β m is the influence rate of the modified parameter p m on P 1 . And the change rate S can be represented as Equation (3), when multiple parameters J = { p j } , J { p 1 p 2 p n } are modified.
S = j , p j J n w j 1 + β j + i , p i J n w i i = 1 n w i
Although neither the β i nor the w i are known, the values of P 1 and P 1 , j are already known in the former preprocess, therefore, w j and β j can be replaced via Δ P 1 , j as the Equation (4) below.
Δ P 1 , j = P 1 , j P 1 = w j β j i = 1 n w i P 1
Based on the Equations (3) and (4), the change rate S can be calculated via available information, as the Equation (6) below.
S = 1 + j , p j J n Δ P 1 , j P 1
And P 1 , J can be calculated as Equation (6), when multiple parameters J = { p j } , J { p 1 p 2 p n } are modified.
P 1 , J = P 1 S = P 1 + j , p j J n Δ P 1 , j
As the Equations (7) and (8) show separately, the power ( P 2 ) and the area ( A 1 ) can be calculated in the same way.
P 2 , J = P 2 + j , p j J n Δ P 2 , j
A 1 , J = A 1 + j , p j J n Δ A 1 , j

3.2.2. Multivariate Nonlinear Regression Prediction Model

Inspired via Amdahl’s law [19], the MNL-PM is proposed to estimate the PPA. Amdahl’s law provides the maximum theoretical speedup achievable by a system which reflects the influence of each part in one system to the whole system at the same time. In detail, for each parameter p i , i ( 1 , n ) , the proposed method assumes that there are rates w i , w i , and w i , each for its proportion in performance, power, and area, while w i , w i , and w i > 0 and i ( 1 , n ) w i , i ( 1 , n ) w i , i ( 1 , n ) w i < 1 . Taking performance ( P 1 ) from PPA for instance, P 1 , the value of the baseline’s performance, will turn to P 1 , m and fellow equations holding when parameter p m , m ( 1 , n ) is modified to other value in its value range.
S 1 , m = 1 1 w m + w m / ( 1 + β m )
P 1 , m = P 1 S 1 , m
In the Equation (9), S 1 , m is the change rate and β m is the change rate of the modified p m . And the change rate S 1 , J S can be represented as Equation (11), when multiple parameters J = { p j } , J { p 1 p 2 p n } are modified.
S 1 , J = 1 1 j , p j J n w j + j , p j J n w j 1 + β j
Although neither the β i nor the w i are known, the values of P 1 and P 1 , j are already known in the former preprocess. And S 1 , j can directly expressed via P 1 and P 1 , j , as the Equation (14) below.
S 1 , j = P 1 , j / P 1
Based on the Equations (11) and (12), the change rate S can be calculated via available information, as the Equation (13) below.
S 1 , J = 1 j , p j J n P 1 P 1 , j k + 1
And P 1 , J S can be calculated as Equation (14), when multiple parameters J = { p j } , J { p 1 p 2 p n } are modified.
P 1 , J = P 1 S 1 , J = P 1 1 j J P 1 P 1 , j k + 1
As the Equations (15) and (16) show separately, the power ( P 2 ) and the area ( A 1 ) can be calculated in the same way
P 2 , J = P 2 1 j J P 2 P 2 , j k + 1
A 1 , J = A 1 1 j J A 1 A 1 , j k + 1

4. Experimental Results

4.1. Implementation Details

In this part, experiments are designed to test the performance of the proposed PPA prediction method for template-based processor design method. For the PPA prediction method there are parameter set samples designed for the NLR PPA prediction method testing. And the QoRs of these samples, obtained via the EDA tools, are seemed as the standard value, while the QoRs of these samples, obtained via the proposed proposed method, are seemed as the prediction value.
The template-based method, Rocket-Chip Generator is chosen as a case. To obtain the standard PPA value of the samples, the synthesis is based on the EDA tool with the 28 nm technology and other constraints are defined in the tcl file; and simulation with benchmark program, specifically, the benchmark program is the Dhrystone offered via the Rocket-Chip. In this way, the standard PPA values of the 42 sample during the preprocess are obtained.

4.2. PPA Prediction for the Parameter Set

First, the samples are evenly sampled from the design space of the parameter set. Specifically, the number of parameters that are chosen to change ranges from 2 to 27, and for each case, there are 10 random samples. Therefore, there are 26 × 10 = 260 samples in total. Then, the ML-PM and MNL-PM are both used to estimate the PPA of these 260 samples. Moreover, the standard PPA value is also generated via synthesis and simulation flows, and Table 4 shows the PPA prediction performance for the parameter set of both the ML-PM and MNL-PM. Moreover, the accuracy is calculated via (17), in which the prediction is the PPA value obtained via the proposed NLR method andthe standard is the PPA value obtained via the EDA tools.
A c c u r a c y ( % ) = ( 1 | p r e d i c t i o n s t a n d a r d | s t a n d a r d ) × 100 %
Table 4. The accuracy of the PPA prediction method.
The results show that MNL-PM performs better on the PPA estimation than ML-PM with higher accuracy and smaller standard deviation. Furthermore, the performance on the PPA prediction with the different number of changed parameters of ML-PM and MNL-PM is shown in Figure 3. The results shows that both proposed method works better when the number of changed parameter is less than 11, and the proposed method works mostly stable on the power prediction. As the results of Equations (1) and (11) are similar when β m is small, the main reason for this fact is that the small value range of each parameter in the parameter set design space, which causes the influence of each parameter is little.
Figure 3. The performance on the PPA prediction with the different number of changed parameters. ((a) for performance, (b) for power, and, (c) for area).

PPA Prediction for Parameters in the Same Module

In a real processor design process, the changes to an existing design in one time iteration usually appear in the same module. The samples of the parameter set design space in the experiment above cannot represent this behavior well. Therefore, the samples of the changed parameters in the same module are sampled in this section. In this experiment, there are 11, 5, 7, and 4 parameters in the Core, ICache, Dcache, and BTB modules, respectively. Taking the Core module as an example, the number of parameters that are chosen to change ranges from 2 to 11, and for each case, there are 4 random samples. In the same way, the samples for the other three modules are generated, and there are 92 samples. Moreover, the standard PPA value is also generated of these 92 samples in total. Then, the ML-PM and MNL-PM are both used to estimate the PPA via synthesis and simulation flows, and a scatter plot of the PPA prediction results for the parameter in the same module are shown in Figure 4.
Figure 4. ML-PM on the same number of changed parameters in the same modules and randomly distributed.
In Figure 5, MNL-PM performs better when the changed parameters are in the same module than randomly distributed.The results shows that there is irregular influence on the ML-PM.
Figure 5. MNL-PM on the same number of changed parameters in the same modules and randomly distributed.

5. Discussion

To meet the demand of processors with different PPA requirements for various scenarios in the AIoT, this paper presents a rapid and accurate PPA prediction method for template-based processor design methods. The two prediction models, the ML-PM and MNL-PM, are built to estimate the PPA for certain parameter designs. The experimental results show that the MNL-PM is better than the ML-PM and both prediction models work well in the instance of the single-core chip template in the Rocket Chip Generator. However, as limitation of the value range is small and the limitation of the single-core chip template whose PPA range is small, the advantage of the MNL-PM is not obvious as shown in Table 4. Therefore, in our further study, we will work on improve the chip template and enlarge the value range of the parameters in the template.

6. Conclusions

To meet the demand of processors with different PPA requirements for various scenarios in the AIoT, this paper presents a rapid and accurate PPA prediction method for template-based processor design methods. In this method, there is the preprocess for the template-based methods at first. Then, two prediction models, the ML-PM and MNL-PM, are built to estimate the PPA for certain parameter designs. An empirical evaluation of the method shows that the PPA prediction for the template-based chip design methods can reach 98.60%, 99.19%, and 98.68% accuracy on performance, power, and area, respectively, when compared with PPA generated via synthesis and simulation flows. In future studies, on the one hand, the value range of the parameters in the parameter set will be expanded, and on the other hand, an exploration algorithm will be introduced to replace the manual parameter design in the parameter design process, as shown in Figure 2.

Author Contributions

Conceptualization, M.T., L.H. and W.C.; methodology, M.T., L.H. and W.C.; software, M.T.; validation, M.T., L.H. and W.C.; formal analysis, M.T., L.H. and W.C.; investigation, M.T., L.H. and W.C.; resources, M.T., L.H. and W.C.; data curation, M.T., L.H. and W.C.; writing—original draft preparation, M.T.; writing—review and editing, M.T., L.H. and W.C.; visualization, W.C.; supervision, M.T., L.H. and W.C.; project administration, M.T., L.H. and W.C.; funding acquisition, L.H. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by he Independent and open subject fund (grant no. 202101-10) from State Key Laboratory of High Performance Computing, National Nature Science Foundation of China (NSFC) under Grant No. 62090023 and No. 618722374, and the National Key Research and Development Program of China (No. 2018YFB0204301).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the Independent and open subject fund (grant no. 202101-10) from State Key Laboratory of High Performance Computing, National Nature Science Foundation of China (NSFC) under Grant No. 62090023 and No. 618722374, and the National Key Research and Development Program of China (No. 2018YFB0204301).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dennard, R.H.; Gaensslen, F.H.; Rideout, V.L.; Bassous, E.; LeBlanc, A.R. Design of ion-implanted mosfet’s with very small physical dimensions. IEEE J. Solid-State Circ. 1974, 9, 256–268. [Google Scholar] [CrossRef] [Green Version]
  2. Moore, G.E. Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, april 19, 1965, pp.114 ff. IEEE Solid-State Circ. Soc. Newsl. 2006, 11, 33–35. [Google Scholar] [CrossRef]
  3. Hennessy, J.L.; Patterson, D.A. A new golden age for computer architecture. Commun. ACM 2019, 62, 48–60. [Google Scholar] [CrossRef] [Green Version]
  4. Ghosh, A.; Chakraborty, D.; Law, A. Artificial intelligence in Internet of things. CAAI Trans. Intell. Technol. 2018, 3, 208–218. [Google Scholar] [CrossRef]
  5. Bao, Y.; Chang, Y.; Han, Y.; Huang, L.; Li, H.; Liang, Y.; Luo, G.; Shang, L.; Tang, D.; Wang, Y.; et al. Agile Design of Processor Chips: Issues and Challenges. J. Comput. Res. Dev. 2021, 58, 1131. [Google Scholar]
  6. Bachrach, J.; Vo, H.; Richards, B.; Lee, Y.; Waterman, A.; Avižienis, R.; Wawrzynek, J.; Asanović, K. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference, San Francisco, CA, USA, 3–7 June 2012; pp. 1216–1225. [Google Scholar]
  7. Kooijman, M. Haskell as a Higher Order Structural Hardware Description Language. December 2009. Available online: http://essay.utwente.nl/59381/ (accessed on 17 August 2022).
  8. Clow, J.; Tzimpragos, G.; Dangwal, D.; Guo, S.; McMahan, J.; Sherwood, T. A pythonic approach for rapid hardware prototyping and instrumentation. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017; pp. 1–7. [Google Scholar]
  9. Choudhary, N.K.; Wadhavkar, S.V.; Shah, T.A.; Mayukh, H.; Gandhi, J.; Dwiel, B.H.; Navada, S.; Najaf-abadi, H.H.; Rotenberg, E. FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. ACM SIGARCH Comput. Archit. News 2011, 39, 11–22. [Google Scholar] [CrossRef]
  10. Asanović, K.; Avizienis, R.; Bachrach, J.; Beamer, S.; Biancolin, D.; Celio, C.; Cook, H.; Dabbelt, D.; Hauser, J.; Izraelevitz, A.; et al. The Rocket Chip Generator; Tech. Rep. UCB/EECS-2016-17; EECS Department, University of California: Berkeley, CA, USA, 2016; p. 4. [Google Scholar]
  11. Zhang, S.; Wright, A.; Bourgeat, T.; Arvind. Composable building blocks to open up processor design. In Proceedings of the 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Fukuoka, Japan, 20–24 October 2018; pp. 68–81.
  12. Decaluwe, J. MyHDL: A python-based hardware description language. Linux J. 2004, 2004, 5. [Google Scholar]
  13. Lockhart, D.; Zibrat, G.; Batten, C. PyMTL: A unified framework for vertically integrated computer architecture research. In Proceedings of the 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, UK, 13–17 December 2014; pp. 280–292. [Google Scholar]
  14. Lee, Y.; Waterman, A.; Cook, H.; Zimmer, B.; Keller, B.; Puggelli, A.; Kwak, J.; Jevtic, R.; Bailey, S.; Chiu, P.; et al. An agile approach to building R3SC-V microprocessors. IEEE Micro 2016, 36, 8–20. [Google Scholar] [CrossRef]
  15. Mahapatra, A.; Schafer, B.C. Machine-learning based simulated annealer method for high level synthesis design space exploration. In Proceedings of the 2014 Electronic System Level Synthesis Conference (ESLsyn), San Francisco, CA, USA, 31 May–1 June 2014; pp. 1–6. [Google Scholar]
  16. Lin, Z.; Zhao, J.; Sinha, S.; Zhang, W. HL-Pow: A learning-based power modeling framework for high-level synthesis. In Proceedings of the 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Beijing, China, 13–16 January 2020; pp. 574–580. [Google Scholar]
  17. Davis, W.R.; Franzon, P.D.; Francisco, L.; Huggins, B.; Jain, R. Fast and Accurate PPA Modeling with Transfer Learning. In Proceedings of the 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, 1–4 November 2021; pp. 1–8. [Google Scholar]
  18. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  19. Gustafson, J.L. Reevaluating Amdahl’s law. Commun. ACM 1988, 31, 532–533. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.