Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. System Model
3.2. Power Measurement Utility (PMU)
3.3. Simulation Environment
3.4. Reinforcement Learning Algorithm
3.5. Training Algorithm
Algorithm 1 Proposed PMU-RL algorithm |
1: Initialise Q-values of table Q(s, a) to full zero matrices. 2: Observe the current state s. Require: 3: States S = (S1, S2..., S32) 4: Action A = A1 and A2 (PL clock start and stop) 5: Reward function: S * A → R 6: Set = 0.1, [0, 1] (typically = 0.1), [0, 1] (typically = 0.9) 7: Update the Q-value with the reward. 8: Find a maximum reward for the next n steps state. 9: Bellman Equation: Q(s, a) ← Q(s, a) + [ R + max * Q(s’, a’)–Q(s, a)] 10: Model[S][A].R = R; 11: Model[S][A].S_ = S; 12: unsigned S2, A2, S_2 , float R2 13: for n = do 14: Q(s2, a2) ← Q(s2, a2) + [ R2 + max * Q(s2’, a2’)–Q(s2, a2); 15: S2 = S_2; A2 = GET_ACTION_SATE(S_2); 16: A = GET_ACTION_SATE(S_); 17: S = S_; 18: end 19: return Action; 20: Set state to the new state, until S is termination. |
- Up—<Power rises>
- Down—<Power falls>
- High-smooth—<Power consumption rises and then stabilises>
- Low-smooth—<Power consumption drops and then stabilises>
- Fully-operational—<Full load for hardware>
- Idle—<Waiting for FPGA hardware acceleration calling>
- OFF—<FPGA clock stop>
4. Implementation and Evaluation
4.1. Test Setup
4.2. Reinforcement Learning Agent Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Hardware Platform
References
- Ou, Z.; Pang, B.; Deng, Y.; Nurminen, J.K.; Yla-Jaaski, A.; Hui, P. Energy-and cost-efficiency analysis of arm-based clusters. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Ottawa, ON, Canada, 13–16 May 2012; pp. 115–123. [Google Scholar]
- Höppner, S.; Yan, Y.; Vogginger, B.; Dixius, A.; Partzsch, J.; Neumärker, F.; Hartmann, S.; Schiefer, S.; Scholze, S.; Ellguth, G.; et al. Dynamic voltage and frequency scaling for neuromorphic many-core systems. In Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, USA, 28–31 May 2017; pp. 1–4. [Google Scholar]
- Beldachi, A.F.; Nunez-Yanez, J.L. Accurate Power control and monitoring in ZYNQ boards. In Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany, 2–4 September 2014; pp. 1–4. [Google Scholar]
- Bauer, W.; Holzinger, P.; Reichenbach, M.; Vaas, S.; Hartke, P.; Fey, D. Programmable HSA Accelerators for ZYNQ UltraScale+ MPSoC Systems. In Proceedings of the European Conference on Parallel Processing, Turin, Italy, 27–31 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 733–744. [Google Scholar]
- Crockett, L.; Northcote, D.; Ramsay, C.; Robinson, F.; Stewart, R. Exploring ZYNQ MPSoC: With PYNQ and Machine Learning Applications; University of Strathclyde: Glasgow, UK, 2019. [Google Scholar]
- Han, J.J.; Wu, X.; Zhu, D.; Jin, H.; Yang, L.T.; Gaudiot, J.L. Synchronization-aware energy management for VFI-based multicore real-time systems. IEEE Trans. Comput. 2012, 61, 1682–1696. [Google Scholar] [CrossRef]
- Sapuppo, F.; Schembri, F.; Fortuna, L.; Bucolo, M. Microfluidic circuits and systems. IEEE Circuits Syst. Mag. 2009, 9, 6–19. [Google Scholar] [CrossRef]
- Seifoori, Z.; Ebrahimi, Z.; Khaleghi, B.; Asadi, H. Introduction to emerging sram-based fpga architectures in dark silicon era. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2018; Volume 110, pp. 259–294. [Google Scholar]
- Walls, C. Embedded Software: The Works; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
- Oliveira, B.G.; Lobo, J. Interactive Demonstration of an Energy Efficient YOLOv3 Implementation in Reconfigurable Logic. In Proceedings of the 2019 5th Experiment International Conference, Funchal, Portugal, 12–14 June 2019; pp. 235–236. [Google Scholar]
- Chen, Y.L.; Chang, M.F.; Yu, C.W.; Chen, X.Z.; Liang, W.Y. Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems. Sensors 2018, 18, 3068. [Google Scholar] [CrossRef] [PubMed]
- Keller, B.A.; Nikolic, B.; Asanović, K.; Callaway, D. Energy-Efficient System Design through Adaptive Voltage Scaling; eScholarship; University of California: Berkeley, CA, USA, 2017. [Google Scholar]
- Baruah, T.; Sun, Y.; Dong, S.; Kaeli, D.; Rubin, N. Airavat: Improving energy efficiency of heterogeneous applications. In Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 731–736. [Google Scholar]
- Pagani, S.; Sai Manoj, P.D.; Jantsch, A.; Henkel, J. Machine learning for power, energy, and thermal management on multi-core processors: A survey. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 39, 101–116. [Google Scholar] [CrossRef]
- Liu, W.; Tan, Y.; Qiu, Q. Enhanced Q-learning algorithm for dynamic power management with performance constraint. In Proceedings of the 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden, Germany, 8–12 March 2010; pp. 602–605. [Google Scholar]
- Bai, Y.; Lee, V.W.; Ipek, E. Voltage regulator efficiency aware power management. ACM Sigops Oper. Syst. Rev. 2017, 51, 825–838. [Google Scholar] [CrossRef]
- Da Silva, L.M.; Torquato, M.F.; Fernandes, M.A. Parallel implementation of reinforcement learning q-learning technique for fpga. IEEE Access 2018, 7, 2782–2798. [Google Scholar] [CrossRef]
- Singh, A.K.; Leech, C.; Reddy, B.K.; Al-Hashimi, B.M.; Merrett, G.V. Learning-based run-time power and energy management of multi/many-core systems: Current and future trends. J. Low Power Electron. 2017, 13, 310–325. [Google Scholar] [CrossRef]
- Dhiman, G.; Rosing, T.S. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Proceedings of the 2007 international symposium on Low power electronics and design, Portland, OR, USA, 27–29 August 2007; pp. 207–212. [Google Scholar]
- Shen, H.; Lu, J.; Qiu, Q. Learning based DVFS for simultaneous temperature, performance and energy management. In Proceedings of the Thirteenth International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 19–21 March 2012; pp. 747–754. [Google Scholar]
- Fettes, Q.; Clark, M.; Bunescu, R.; Karanth, A.; Louri, A. Dynamic Voltage and Frequency Scaling in NoCs with Supervised and Reinforcement Learning Techniques. IEEE Trans. Comput. 2018, 68, 375–389. [Google Scholar] [CrossRef]
- Sadek, A.; Muddukrishna, A.; Kalms, L.; Djupdal, A.; Podlubne, A.; Paolillo, A.; Goehringer, D.; Jahre, M. Supporting Utilities for Heterogeneous Embedded Image Processing Platforms (STHEM): An Overview; ARC Springer: Cham, Switzerland, 2018; pp. 737–749. [Google Scholar]
- Kalb, T.; Kalms, L.; Göhringer, D.; Pons, C.; Marty, F.; Muddukrishna, A.; Jahre, M.; Kjeldsberg, P.G.; Ruf, B.; Schuchert, T.; et al. TULIPP: Towards ubiquitous low-power image processing platforms. In Proceedings of the 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), Agios Konstantinos, Greece, 17–21 July 2016; pp. 306–311. [Google Scholar]
- Duhem, T.F.; Christensen, S.F.; Paolillo, H.A.; Kalms, R.L.; Peterson, S.M.; Schuchert, I.T.; Jahre, N.M.; Muddukrishna, N.A.; Rodriguez, H.B. Towards Ubiquitous Low-Power Image Processing Platforms. 2020. Available online: http://tulipp.eu/wp-content/uploads/2020/06/Towards-Ubiquitous-Low-power-Image-Processing.pdf (accessed on 1 September 2020).
- Muddukrishna, A.; Djupdal, A.; Jahre, M. Power Profiling of Embedded Vision Applications in the Tulipp Project. ACM MCC17. 2017. Available online: http://tulipp.eu/wp-content/uploads/2018/06/MCC17_paper_22.pdf (accessed on 1 September 2020).
- Kalb, T.; Kalms, L.; Göhringer, D.; Pons, C.; Muddukrishna, A.; Jahre, M.; Ruf, B.; Schuchert, T.; Tchouchenkov, I.; Ehrenstråhle, C.; et al. Developing Low-Power Image Processing Applications with the TULIPP Reference Platform Instance. In Hardware Accelerators in Data Centers; Springer: Berlin/Heidelberg, Germany, 2019; pp. 181–197. [Google Scholar]
- Restuccia, F.; Biondi, A.; Marinoni, M.; Cicero, G.; Buttazzo, G. AXI HyperConnect: A Predictable, Hypervisor-Level Interconnect for Hardware Accelerators in FPGA SoC. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020. [Google Scholar]
- Reiche Myrgård, M. Acceleration of Deep Convolutional Neural Networks on Multiprocessor System-on-Chip. Available online: https://www.diva-portal.org/smash/get/diva2:1326323/FULLTEXT01.pdf (accessed on 29 September 2020).
- Yao, S.; Guo, K. Deep Processing Unit (Dpu) for Implementing an Artificial Neural Network (Ann). US20180046903A1, 15 February 2018. [Google Scholar]
- Zhu, J.; Wang, L.; Liu, H.; Tian, S.; Deng, Q.; Li, J. An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs. IEEE Access 2020, 8, 83224–83237. [Google Scholar] [CrossRef]
- Madden, M.G.; Howley, T. Experiments with reinforcement learning in environments with progressive difficulty. In Proceedings of the 14th Irish Conference on Artificial Intelligence and Cognitive Science, Trinity College Dublin, Dublin, UK, 17–19 September 2003. [Google Scholar]
- Woolf, B.P. Building Intelligent Interactive Tutors: Student-Centered Strategies for Revolutionizing E-Learning; Morgan Kaufmann: Burlington, MA, USA, 2010. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Jiang, W.; Yu, H.; Zhang, J.; Wu, J.; Luo, S.; Ha, Y. Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling. J. Semicond. 2020, 41, 022406. [Google Scholar] [CrossRef]
- Kechiche, L.; Touil, L.; Ouni, B. Toward the Implementation of an ASIC-Like System on FPGA for Real-Time Video Processing with Power Reduction. Int. J. Reconfig. Comput. 2018, 2018, 2843582. [Google Scholar] [CrossRef]
Sensor | Measurement Voltage | Supplied Hardware |
---|---|---|
1 | 3.3 V | VCS-1 bottom board |
2 | 5 V | VCS-1 bottom board |
3 | 5 V | VCS-1 top board |
4 | 3.3 V | VCS-1 top board |
5 | 12 V | VCS-1 top board |
6 | 12 V | VCS-1 bottom board |
7 | Empty | Empty |
Hyper-Parameter | Value |
---|---|
1. Train Start Size | 1000 ms n min + Random (0–1024) |
2. Minibatch Size (Granularity) | 64 ms |
3. (Learning Rate) | 0.1 |
4. (Discount Rate) | 0.9 |
5. (Epsilon Greedy) | 0.9 |
6. Reward Growth Rate | 0.6 |
7. Learning Stop | Different < 0.9% |
8. FPGA Clock Stop POWER | 0.1 w |
9. FPGA Low POWER | 1 w |
10. FPGA Fully Operation POWER | 6 w |
11. ARM Idle POWER | 1 w |
12. ARM STEP POWER | 2 w |
Energy | Times | 1 s | 10 s | 30 s | 1 min | 2 min | |
---|---|---|---|---|---|---|---|
Work | |||||||
Normal Work Mode | 0.98 mWh | 95.40 mWh | 878.46 mWh | 3517.43 mWh | 14,000.34 mWh | ||
RL for Hardware Control | 0.79 mWh | 79.54 mWh | 741.84 mWh | 2976.09 mWh | 11,822.75 mWh | ||
Energy Efficiency Improvement | 18.47% | 16.62% | 15.55% | 15.39% | 15.55% |
Project | Hardware | Method | Power Efficiency Improved |
---|---|---|---|
Our Work | VSC-1 board based on Xilinx ZYNQ UltraScale+ MPSoC Chip and Lynsyn board | RL on the MCU to control and stop the clocks when no data are being exchanged via the I/Os. | Up to 18% power reduction compared with the original model |
Wei-xiong et al. [34] | Xilinx ZYNQ UltraScale+ MPSoC ZCU104 | Designed fine-grained DVFS by mixed-mode clock manager (MMCM) as the clock generator | About 15% power reduction at 680 mV/285 MHz |
Lilia et al. [35] | Xilinx ZYNQ ZC702 | Based on the adaptive frequency scaling (AFS), which is a software configuration of the MMCM for frequency scaling and power management bus (PMBus) for voltage scaling | Up to 12% for power savings on the AFS |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Z.; Machado, P.; Zahid, A.; Abdulghani, A.M.; Dashtipour, K.; Heidari, H.; Imran, M.A.; Abbasi, Q.H. Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics 2020, 9, 1812. https://doi.org/10.3390/electronics9111812
Yu Z, Machado P, Zahid A, Abdulghani AM, Dashtipour K, Heidari H, Imran MA, Abbasi QH. Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics. 2020; 9(11):1812. https://doi.org/10.3390/electronics9111812
Chicago/Turabian StyleYu, Zheqi, Pedro Machado, Adnan Zahid, Amir M. Abdulghani, Kia Dashtipour, Hadi Heidari, Muhammad A. Imran, and Qammer H. Abbasi. 2020. "Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning" Electronics 9, no. 11: 1812. https://doi.org/10.3390/electronics9111812
APA StyleYu, Z., Machado, P., Zahid, A., Abdulghani, A. M., Dashtipour, K., Heidari, H., Imran, M. A., & Abbasi, Q. H. (2020). Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics, 9(11), 1812. https://doi.org/10.3390/electronics9111812