Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning
Abstract
:1. Introduction
2. Related Work
3. Methodology
3.1. System Model
3.2. Power Measurement Utility (PMU)
3.3. Simulation Environment
3.4. Reinforcement Learning Algorithm
3.5. Training Algorithm
Algorithm 1 Proposed PMU-RL algorithm |
1: Initialise Q-values of table Q(s, a) to full zero matrices. 2: Observe the current state s. Require: 3: States S = (S1, S2..., S32) 4: Action A = A1 and A2 (PL clock start and stop) 5: Reward function: S * A → R 6: Set = 0.1, [0, 1] (typically = 0.1), [0, 1] (typically = 0.9) 7: Update the Q-value with the reward. 8: Find a maximum reward for the next n steps state. 9: Bellman Equation: Q(s, a) ← Q(s, a) + [ R + max * Q(s’, a’)–Q(s, a)] 10: Model[S][A].R = R; 11: Model[S][A].S_ = S; 12: unsigned S2, A2, S_2 , float R2 13: for n = do 14: Q(s2, a2) ← Q(s2, a2) + [ R2 + max * Q(s2’, a2’)–Q(s2, a2); 15: S2 = S_2; A2 = GET_ACTION_SATE(S_2); 16: A = GET_ACTION_SATE(S_); 17: S = S_; 18: end 19: return Action; 20: Set state to the new state, until S is termination. |
- Up—<Power rises>
- Down—<Power falls>
- High-smooth—<Power consumption rises and then stabilises>
- Low-smooth—<Power consumption drops and then stabilises>
- Fully-operational—<Full load for hardware>
- Idle—<Waiting for FPGA hardware acceleration calling>
- OFF—<FPGA clock stop>
4. Implementation and Evaluation
4.1. Test Setup
4.2. Reinforcement Learning Agent Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Hardware Platform
References
- Ou, Z.; Pang, B.; Deng, Y.; Nurminen, J.K.; Yla-Jaaski, A.; Hui, P. Energy-and cost-efficiency analysis of arm-based clusters. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Ottawa, ON, Canada, 13–16 May 2012; pp. 115–123. [Google Scholar]
- Höppner, S.; Yan, Y.; Vogginger, B.; Dixius, A.; Partzsch, J.; Neumärker, F.; Hartmann, S.; Schiefer, S.; Scholze, S.; Ellguth, G.; et al. Dynamic voltage and frequency scaling for neuromorphic many-core systems. In Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, USA, 28–31 May 2017; pp. 1–4. [Google Scholar]
- Beldachi, A.F.; Nunez-Yanez, J.L. Accurate Power control and monitoring in ZYNQ boards. In Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany, 2–4 September 2014; pp. 1–4. [Google Scholar]
- Bauer, W.; Holzinger, P.; Reichenbach, M.; Vaas, S.; Hartke, P.; Fey, D. Programmable HSA Accelerators for ZYNQ UltraScale+ MPSoC Systems. In Proceedings of the European Conference on Parallel Processing, Turin, Italy, 27–31 August 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 733–744. [Google Scholar]
- Crockett, L.; Northcote, D.; Ramsay, C.; Robinson, F.; Stewart, R. Exploring ZYNQ MPSoC: With PYNQ and Machine Learning Applications; University of Strathclyde: Glasgow, UK, 2019. [Google Scholar]
- Han, J.J.; Wu, X.; Zhu, D.; Jin, H.; Yang, L.T.; Gaudiot, J.L. Synchronization-aware energy management for VFI-based multicore real-time systems. IEEE Trans. Comput. 2012, 61, 1682–1696. [Google Scholar] [CrossRef] [Green Version]
- Sapuppo, F.; Schembri, F.; Fortuna, L.; Bucolo, M. Microfluidic circuits and systems. IEEE Circuits Syst. Mag. 2009, 9, 6–19. [Google Scholar] [CrossRef]
- Seifoori, Z.; Ebrahimi, Z.; Khaleghi, B.; Asadi, H. Introduction to emerging sram-based fpga architectures in dark silicon era. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2018; Volume 110, pp. 259–294. [Google Scholar]
- Walls, C. Embedded Software: The Works; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
- Oliveira, B.G.; Lobo, J. Interactive Demonstration of an Energy Efficient YOLOv3 Implementation in Reconfigurable Logic. In Proceedings of the 2019 5th Experiment International Conference, Funchal, Portugal, 12–14 June 2019; pp. 235–236. [Google Scholar]
- Chen, Y.L.; Chang, M.F.; Yu, C.W.; Chen, X.Z.; Liang, W.Y. Learning-Directed Dynamic Voltage and Frequency Scaling Scheme with Adjustable Performance for Single-Core and Multi-Core Embedded and Mobile Systems. Sensors 2018, 18, 3068. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Keller, B.A.; Nikolic, B.; Asanović, K.; Callaway, D. Energy-Efficient System Design through Adaptive Voltage Scaling; eScholarship; University of California: Berkeley, CA, USA, 2017. [Google Scholar]
- Baruah, T.; Sun, Y.; Dong, S.; Kaeli, D.; Rubin, N. Airavat: Improving energy efficiency of heterogeneous applications. In Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 731–736. [Google Scholar]
- Pagani, S.; Sai Manoj, P.D.; Jantsch, A.; Henkel, J. Machine learning for power, energy, and thermal management on multi-core processors: A survey. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 39, 101–116. [Google Scholar] [CrossRef]
- Liu, W.; Tan, Y.; Qiu, Q. Enhanced Q-learning algorithm for dynamic power management with performance constraint. In Proceedings of the 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden, Germany, 8–12 March 2010; pp. 602–605. [Google Scholar]
- Bai, Y.; Lee, V.W.; Ipek, E. Voltage regulator efficiency aware power management. ACM Sigops Oper. Syst. Rev. 2017, 51, 825–838. [Google Scholar] [CrossRef]
- Da Silva, L.M.; Torquato, M.F.; Fernandes, M.A. Parallel implementation of reinforcement learning q-learning technique for fpga. IEEE Access 2018, 7, 2782–2798. [Google Scholar] [CrossRef]
- Singh, A.K.; Leech, C.; Reddy, B.K.; Al-Hashimi, B.M.; Merrett, G.V. Learning-based run-time power and energy management of multi/many-core systems: Current and future trends. J. Low Power Electron. 2017, 13, 310–325. [Google Scholar] [CrossRef]
- Dhiman, G.; Rosing, T.S. Dynamic voltage frequency scaling for multi-tasking systems using online learning. In Proceedings of the 2007 international symposium on Low power electronics and design, Portland, OR, USA, 27–29 August 2007; pp. 207–212. [Google Scholar]
- Shen, H.; Lu, J.; Qiu, Q. Learning based DVFS for simultaneous temperature, performance and energy management. In Proceedings of the Thirteenth International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 19–21 March 2012; pp. 747–754. [Google Scholar]
- Fettes, Q.; Clark, M.; Bunescu, R.; Karanth, A.; Louri, A. Dynamic Voltage and Frequency Scaling in NoCs with Supervised and Reinforcement Learning Techniques. IEEE Trans. Comput. 2018, 68, 375–389. [Google Scholar] [CrossRef]
- Sadek, A.; Muddukrishna, A.; Kalms, L.; Djupdal, A.; Podlubne, A.; Paolillo, A.; Goehringer, D.; Jahre, M. Supporting Utilities for Heterogeneous Embedded Image Processing Platforms (STHEM): An Overview; ARC Springer: Cham, Switzerland, 2018; pp. 737–749. [Google Scholar]
- Kalb, T.; Kalms, L.; Göhringer, D.; Pons, C.; Marty, F.; Muddukrishna, A.; Jahre, M.; Kjeldsberg, P.G.; Ruf, B.; Schuchert, T.; et al. TULIPP: Towards ubiquitous low-power image processing platforms. In Proceedings of the 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), Agios Konstantinos, Greece, 17–21 July 2016; pp. 306–311. [Google Scholar]
- Duhem, T.F.; Christensen, S.F.; Paolillo, H.A.; Kalms, R.L.; Peterson, S.M.; Schuchert, I.T.; Jahre, N.M.; Muddukrishna, N.A.; Rodriguez, H.B. Towards Ubiquitous Low-Power Image Processing Platforms. 2020. Available online: http://tulipp.eu/wp-content/uploads/2020/06/Towards-Ubiquitous-Low-power-Image-Processing.pdf (accessed on 1 September 2020).
- Muddukrishna, A.; Djupdal, A.; Jahre, M. Power Profiling of Embedded Vision Applications in the Tulipp Project. ACM MCC17. 2017. Available online: http://tulipp.eu/wp-content/uploads/2018/06/MCC17_paper_22.pdf (accessed on 1 September 2020).
- Kalb, T.; Kalms, L.; Göhringer, D.; Pons, C.; Muddukrishna, A.; Jahre, M.; Ruf, B.; Schuchert, T.; Tchouchenkov, I.; Ehrenstråhle, C.; et al. Developing Low-Power Image Processing Applications with the TULIPP Reference Platform Instance. In Hardware Accelerators in Data Centers; Springer: Berlin/Heidelberg, Germany, 2019; pp. 181–197. [Google Scholar]
- Restuccia, F.; Biondi, A.; Marinoni, M.; Cicero, G.; Buttazzo, G. AXI HyperConnect: A Predictable, Hypervisor-Level Interconnect for Hardware Accelerators in FPGA SoC. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020. [Google Scholar]
- Reiche Myrgård, M. Acceleration of Deep Convolutional Neural Networks on Multiprocessor System-on-Chip. Available online: https://www.diva-portal.org/smash/get/diva2:1326323/FULLTEXT01.pdf (accessed on 29 September 2020).
- Yao, S.; Guo, K. Deep Processing Unit (Dpu) for Implementing an Artificial Neural Network (Ann). US20180046903A1, 15 February 2018. [Google Scholar]
- Zhu, J.; Wang, L.; Liu, H.; Tian, S.; Deng, Q.; Li, J. An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs. IEEE Access 2020, 8, 83224–83237. [Google Scholar] [CrossRef]
- Madden, M.G.; Howley, T. Experiments with reinforcement learning in environments with progressive difficulty. In Proceedings of the 14th Irish Conference on Artificial Intelligence and Cognitive Science, Trinity College Dublin, Dublin, UK, 17–19 September 2003. [Google Scholar]
- Woolf, B.P. Building Intelligent Interactive Tutors: Student-Centered Strategies for Revolutionizing E-Learning; Morgan Kaufmann: Burlington, MA, USA, 2010. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Jiang, W.; Yu, H.; Zhang, J.; Wu, J.; Luo, S.; Ha, Y. Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling. J. Semicond. 2020, 41, 022406. [Google Scholar] [CrossRef]
- Kechiche, L.; Touil, L.; Ouni, B. Toward the Implementation of an ASIC-Like System on FPGA for Real-Time Video Processing with Power Reduction. Int. J. Reconfig. Comput. 2018, 2018, 2843582. [Google Scholar] [CrossRef] [Green Version]
Sensor | Measurement Voltage | Supplied Hardware |
---|---|---|
1 | 3.3 V | VCS-1 bottom board |
2 | 5 V | VCS-1 bottom board |
3 | 5 V | VCS-1 top board |
4 | 3.3 V | VCS-1 top board |
5 | 12 V | VCS-1 top board |
6 | 12 V | VCS-1 bottom board |
7 | Empty | Empty |
Hyper-Parameter | Value |
---|---|
1. Train Start Size | 1000 ms n min + Random (0–1024) |
2. Minibatch Size (Granularity) | 64 ms |
3. (Learning Rate) | 0.1 |
4. (Discount Rate) | 0.9 |
5. (Epsilon Greedy) | 0.9 |
6. Reward Growth Rate | 0.6 |
7. Learning Stop | Different < 0.9% |
8. FPGA Clock Stop POWER | 0.1 w |
9. FPGA Low POWER | 1 w |
10. FPGA Fully Operation POWER | 6 w |
11. ARM Idle POWER | 1 w |
12. ARM STEP POWER | 2 w |
Energy | Times | 1 s | 10 s | 30 s | 1 min | 2 min | |
---|---|---|---|---|---|---|---|
Work | |||||||
Normal Work Mode | 0.98 mWh | 95.40 mWh | 878.46 mWh | 3517.43 mWh | 14,000.34 mWh | ||
RL for Hardware Control | 0.79 mWh | 79.54 mWh | 741.84 mWh | 2976.09 mWh | 11,822.75 mWh | ||
Energy Efficiency Improvement | 18.47% | 16.62% | 15.55% | 15.39% | 15.55% |
Project | Hardware | Method | Power Efficiency Improved |
---|---|---|---|
Our Work | VSC-1 board based on Xilinx ZYNQ UltraScale+ MPSoC Chip and Lynsyn board | RL on the MCU to control and stop the clocks when no data are being exchanged via the I/Os. | Up to 18% power reduction compared with the original model |
Wei-xiong et al. [34] | Xilinx ZYNQ UltraScale+ MPSoC ZCU104 | Designed fine-grained DVFS by mixed-mode clock manager (MMCM) as the clock generator | About 15% power reduction at 680 mV/285 MHz |
Lilia et al. [35] | Xilinx ZYNQ ZC702 | Based on the adaptive frequency scaling (AFS), which is a software configuration of the MMCM for frequency scaling and power management bus (PMBus) for voltage scaling | Up to 12% for power savings on the AFS |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Z.; Machado, P.; Zahid, A.; Abdulghani, A.M.; Dashtipour, K.; Heidari, H.; Imran, M.A.; Abbasi, Q.H. Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics 2020, 9, 1812. https://doi.org/10.3390/electronics9111812
Yu Z, Machado P, Zahid A, Abdulghani AM, Dashtipour K, Heidari H, Imran MA, Abbasi QH. Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics. 2020; 9(11):1812. https://doi.org/10.3390/electronics9111812
Chicago/Turabian StyleYu, Zheqi, Pedro Machado, Adnan Zahid, Amir M. Abdulghani, Kia Dashtipour, Hadi Heidari, Muhammad A. Imran, and Qammer H. Abbasi. 2020. "Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning" Electronics 9, no. 11: 1812. https://doi.org/10.3390/electronics9111812
APA StyleYu, Z., Machado, P., Zahid, A., Abdulghani, A. M., Dashtipour, K., Heidari, H., Imran, M. A., & Abbasi, Q. H. (2020). Energy and Performance Trade-Off Optimization in Heterogeneous Computing via Reinforcement Learning. Electronics, 9(11), 1812. https://doi.org/10.3390/electronics9111812