Run-Time Hierarchical Management of Mapping, Per-Cluster DVFS and Per-Core DPM for Energy Optimization
Abstract
:1. Introduction
- A 0–1 ILP model is presented to establish the dependencies of mapping, per-cluster DVFS and per-core DPM. The objective is to optimize dynamic and static energy of different sets of concurrently active applications (in varied use cases) on heterogeneous cluster-based platforms, taking into account possible run-time configurations overheads due to application migration and DPM mode switching.
- With the 0–1 ILP model, different strategies (i.e., fully separate, partially separate and holistic) are realized into global and local management of the HHMS [11], based on some design-time prepared data. The differences and benefits of the management strategies are illustrated.
- Experimental evaluations are performed for 1023 use cases (i.e., generated in random orders for 10 applications) and on different platform sizes (#cluser × #cores, 2 × 4, 2 × 8, 4 × 4, 4 × 8). The experiments offer insights into the effectiveness of the different management strategies (fully separate, partially separate, and holistic) in terms of application migration, energy efficiency, resource efficiency, and complexity.
2. Background and Motivation
2.1. System Models and HHMS
- A set of mappings using a different number of cores (i.e., different threads). denotes the prepared mapping of mapped on c number of cores.
- A set of minimum allowed frequencies (MAFs). MAF refers to the minimum frequency that allows the application to respect its timing constraint. Each prepared mapping has its own MAF. denotes the MAF of in .
2.2. Motivation of This Paper
3. 0–1 ILP Model Formulation and Proposed Solution
3.1. 0–1 ILP Problem Formulation
- Application-to-cluster mapping: The mapping of active applications (in each use case) on a cluster-based platform is defined by a matrix variables, where is a binary variable equal to:
- Cluster frequency configurations (DVFS): The optimized is set according to the all MAFs of the mapped applications in a cluster. should not be smaller than the lowest common MAF (), which refers to the maximum value of all assigned applications in (Equation (2)). In addition, configuration should be within cluster frequency range ( Equation (3)).
- Task-to-core mapping: The mapping of tasks to cores should respect resource constraint in each cluster. should not be larger than the available cores () in a cluster. depends on the application-to-cluster mapping matrix variables, and on the applied local management strategy. This paper uses the FCFS [11] task-to-core mapping strategy and the corresponding is the sum of the used cores of selected (corresponding to ) in each cluster.
3.2. Proposed Management Strategies
3.2.1. Fully Separate: Mapping → DVFS → DPM
3.2.2. Partially Separate: Mapping & DVFS → DPM
3.2.3. Holistic: Mapping & DVFS & DPM
4. Experimental Results
4.1. Simulation Setup
Evaluated Hybrid-Hierarchical Management Strategies
- Exhaustive: The exhaustive strategy explores all the possible system configurations, i.e., application-to-cluster mappings, task-to-core mappings (indicating DVFS or cluster frequency, see Section 2), and DPM decisions. It aims to achieve the minimum energy (including , , , and ) of the overall system, satisfying application performance (e.g., period) constraints and resource constraints.
- Genetic: The genetic algorithm is based on a standard genetic algorithm [26]. It initializes population individuals, which consist of a set of random chromosomes. Each chromosome consists of two genes, indicating application-to-cluster mapping and task-to-core mapping selection respectively. The fitness function is defined by (see Equation (6), is the system energy in a use case).
4.2. Special Case
4.2.1. Evaluation of Different Platform Sizes
- dominates the overall energy of each platform (see Figure 4a). As illustrated by Equation (2), is highly dependent on core types and cluster frequency, i.e., selected MAF in each cluster. Generally speaking, more clusters/cores can lead to lower . This allows applications with different MAFs to be separated into more different clusters, and lower cluster frequencies () can be achieved for lower . Particularly for each cluster, more available cores allow more threads (of design-time mapping) to be selected for each application, and can consequently reduce the common MAF (, recall Figure 2). This observation is aligned to the work of [11].
- is highly dependent on the number of used cores () and (considering the used static power model). A 2 × 4 platform has the highest static energy due to the high (reflected by ). A 2 × 8 platform has the second highest static energy due to the large (see Figure 4b).
4.2.2. Evaluation of Different Management Strategies
4.3. General Case
4.3.1. Evaluation of Different Management Strategies without Migration
4.3.2. Evaluation of Different Management Strategies Allowing Migrations
4.4. Experimental Discussion
- Larger platform sizes (with more available clusters/cores) can lead to better energy efficiency. Dynamic energy is highly dependent on cluster frequency levels, which are determined by application-to-cluster mapping and task-to-core mapping (reflected by Equations (1) and (2)). Static energy is related to cluster frequency model (depending on the static power model) and the number of used cores.
- The fully separate strategy gives different mapping priorities to clusters. The resulting high frequencies (of some clusters) can lead to high energy consumption of the overall system. This strategy has lowest strategy complexity (see Table 3) due to its simplicity.
- The partially separate and holistic strategies lead to similar energy consumption (5–20% difference with regard to exhaustive) with low strategy complexity (up to 1652× lower with regard to exhaustive).- –
- The partially separate strategy gives higher priority into dynamic energy optimization (with low frequency). It can achieve better energy efficiency than the holistic strategy on two-cluster platforms. This indicates that dynamic energy optimization is more important (with regard to static energy optimization) on small platforms.
- –
- The holistic strategy optimizes dynamic and static energy simultaneously, and it achieves better energy efficiency in four-cluster platforms. The holistic strategy has the best resource efficiency, i.e., using the least cores for energy optimization. Compared to the partially separate strategy, the holistic strategy has slightly higher complexity due to its larger exploration space (see Section 3.2).
 
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Exynos 5 octa (5422). 2021. Available online: http://www.samsung.com/exynos (accessed on 1 February 2022).
- Lintermann, A.; Pleiter, D.; Schröder, W. Performance of ODROID-MC1 for scientific flow problems. Future Gener. Comput. Syst. 2019, 95, 149–162. [Google Scholar] [CrossRef]
- Benini, L.; Bertozzi, D.; Milano, M. Resource management policy handling multiple use cases in mpsoc platforms using constraint programming. In Proceedings of the International Conference on Logic Programming, Prague, Czech Republic, 10–12 September 2008; pp. 470–484. [Google Scholar]
- Muthukaruppan, T.S.; Pricopi, M.; Venkataramani, V.; Mitra, T.; Vishin, S. Hierarchical power management for asymmetric multi-core in dark silicon era. In Proceedings of the 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 29 May–7 June 2013; pp. 1–9. [Google Scholar]
- Reddy, B.K.; Singh, A.K.; Biswas, D.; Merrett, G.V.; Al-Hashimi, B.M. Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores. IEEE Trans. Multi-Scale Comput. Syst. 2017, 4, 369–382. [Google Scholar] [CrossRef]
- Basireddy, K.R.; Singh, A.K.; Al-Hashimi, B.M.; Merrett, G.V. AdaMD: Adaptive mapping and DVFS for energy-efficient heterogeneous multicores. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2019, 39, 2206–2217. [Google Scholar] [CrossRef] [Green Version]
- Pagani, S.; Pathania, A.; Shafique, M.; Chen, J.J.; Henkel, J. Energy efficiency for clustered heterogeneous multicores. IEEE Trans. Parallel Distrib. Syst. 2016, 28, 1315–1330. [Google Scholar] [CrossRef]
- Tariq, U.U.; Ali, H.; Liu, L.; Panneerselvam, J.; Zhai, X. Energy-efficient static task scheduling on VFI-based NoC-HMPSoCs for intelligent edge devices in cyber-physical systems. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–22. [Google Scholar] [CrossRef] [Green Version]
- Dey, S.; Saha, S.; Singh, A.; McDonald-Maier, K. Asynchronous Hybrid Deep Learning (AHDL): A Deep Learning Based Resource Mapping in DVFS Enabled Mobile MPSoCs. In Proceedings of the 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 14 June–31 July 2021; pp. 303–308. [Google Scholar]
- Chen, J.; Manivannan, M.; Abduljabbar, M.; Pericàs, M. ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes. ACM Trans. Archit. Code Optim. 2022, 19, 1–29. [Google Scholar] [CrossRef]
- Yang, S.; Le Nours, S.; Real, M.M.; Pillement, S. 0–1 ILP-based run-time hierarchical energy optimization for heterogeneous cluster-based multi/many-core systems. J. Syst. Archit. 2021, 116, 102035. [Google Scholar] [CrossRef]
- Shamsa, E.; Kanduri, A.; Rahmani, A.M.; Liljeberg, P. Energy-Performance Co-Management of Mixed-Sensitivity Workloads on Heterogeneous Multi-core Systems. In Proceedings of the 26th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 18–21 January 2021; pp. 421–427. [Google Scholar]
- Mo, L.; Zhou, Q.; Kritikakou, A.; Liu, J. Energy Efficient, Real-time and Reliable Task Deployment on NoC-based Multicores with DVFS. In Proceedings of the IEEE/ACM Design, Automation and Test in Europe (DATE), Antwerp, Belgium, 14–23 March 2022. [Google Scholar]
- Bhatti, M.K.; Belleudy, C.; Auguin, M. Hybrid power management in real time embedded systems: An interplay of DVFS and DPM techniques. Real-Time Syst. 2011, 47, 143–162. [Google Scholar] [CrossRef]
- Chen, G.; Huang, K.; Knoll, A. Energy optimization for real-time multiprocessor system-on-chip with optimal DVFS and DPM combination. ACM Trans. Embed. Comput. Syst. 2014, 13, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Srinivasan, K.; Chatha, K.S. Integer linear programming and heuristic techniques for system-level low power scheduling on multiprocessor architectures under throughput constraints. Integration 2007, 40, 326–354. [Google Scholar] [CrossRef]
- Mascitti, A.; Cucinotta, T.; Marinoni, M. An adaptive, utilization-based approach to schedule real-time tasks for ARM big. LITTLE architectures. ACM SIGBED Rev. 2020, 17, 18–23. [Google Scholar] [CrossRef]
- Jejurikar, R.; Pereira, C.; Gupta, R. Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st annual Design Automation Conference, San Diego, CA, USA, 7–11 June 2004; pp. 275–280. [Google Scholar]
- Singh, A.K.; Dziurzanski, P.; Mendis, H.R.; Indrusiak, L.S. A survey and comparative study of hard and soft real-time dynamic resource allocation strategies for multi-/many-core systems. ACM Comput. Surv. 2017, 50, 24. [Google Scholar] [CrossRef] [Green Version]
- Quan, W.; Pimentel, A.D. A hierarchical run-time adaptive resource allocation framework for large-scale MPSoC systems. Des. Autom. Embed. Syst. 2016, 20, 311–339. [Google Scholar] [CrossRef] [Green Version]
- Singh, A.K.; Shafique, M.; Kumar, A.; Henkel, J. Resource and throughput aware execution trace analysis for efficient run-time mapping on MPSoCs. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2016, 35, 72–85. [Google Scholar] [CrossRef]
- Henkel, J.; Khdr, H.; Pagani, S.; Shafique, M. New trends in dark silicon. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (Dac), San Francisco, CA, USA, 8–12 June 2015; pp. 1–6. [Google Scholar]
- Kanduri, A.; Miele, A.; Rahmani, A.M.; Liljeberg, P.; Bolchini, C.; Dutt, N. Approximation-aware coordinated power/performance management for heterogeneous multi-cores. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–28 June 2018; pp. 1–6. [Google Scholar]
- SDF3. Available online: http://www.es.ele.tue.nl/sdf3 (accessed on 1 February 2022).
- Zahaf, H.E.; Benyamina, A.E.H.; Olejnik, R.; Lipari, G. Energy-efficient scheduling for moldable real-time tasks on heterogeneous computing platforms. J. Syst. Archit. 2017, 74, 46–60. [Google Scholar] [CrossRef]
- Lambora, A.; Gupta, K.; Chopra, K. Genetic Algorithm-A Literature Review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 380–384. [Google Scholar] [CrossRef]
- Bienia, C.; Kumar, S.; Singh, J.P.; Li, K. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, New York, NY, USA, 25–29 October 2008; pp. 72–81. [Google Scholar]







| Management Strategy | References | Applications | |
|---|---|---|---|
| ❶ | Mapping → DVFS | [4,5,6,7] | Multiple concurrent apps | 
| Mapping & DVFS | [8,9,10] | One single app | |
| [11,12,13] | Multiple concurrent apps | ||
| ❷ | Mapping → DVFS* & DPM | [14,15] | Periodic independent tasks | 
| Mapping & DVFS* → DPM | [16] | one single app | |
| ❸ | Fully-separate: Mapping → DVFS → DPM | This work | Multiple concurrent apps | 
| Partially-separate: Mapping & DVFS → DPM | |||
| Holistic: Mapping & DVFS & DPM | |||
| System configuration | |
|---|---|
| Hybrid hierarchical management structure | |
| Dynamic voltage and frequency scaling | |
| Dynamic power management | |
| Application level, platform level, and management level | |
| An application with index i | |
| The period of an application | |
| ={} | A use case consists of a set of simultaneously active applications | 
| I | The number of activate applications in a use case | 
| A cluster with index j | |
| J | The number of available clusters in the platform | 
| The number of available cores in | |
| The available discrete frequency levels of | |
| Global management | |
| Local management | |
| The minimum allowed frequency (MAF) of mapped on c cores in | |
| The maximum number of cores used by design-time-prepared mapping for an application | |
| The lowest common MAF of the mapped applications in | |
| The design-time mapping of mapped on c number of cores | |
| The number of available cores in | |
| The number of used cores in | |
| The run-time application-to-cluster mapping of to | |
| The run-time configured frequency of | |
| Energy model | |
| The overall energy of system | |
| The dynamic energy of system | |
| The static energy of system | |
| The overhead energy of DPM mode switching | |
| The overhead energy of application migration | |
| Experiment | |
| FS | Fully separate strategy, mapping → DVFS → DPM | 
| PS | Partially separate strategy, mapping & DVFS → DPM | 
| H | Holistic strategy, mapping & DVFS & DPM | 
| Allowing x migration(s) across clusters | |
| Static energy ratio, the ratio of static energy compared to dynamic energy | |
| Least comment multiple of the active application periods in each use case | |
| least comment multiple of the active application periods in each use case | |
| First come first served task mapping strategy in local management | |
| The number of used cores of all clusters | |
| Platform | Special Case | General Case | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -M0 | -M1 | -M2 | |||||||||||
| Exhaustive | Genetic | FS | PS | H | Exhaustive | FS | PS | H | PS | H | PS | H | |
| 2 × 4 | 1.35 | 804.35 | 0.05 | 0.12 | 0.12 | 0.23 | 0.07 | 0.13 | 0.14 | 0.15 | 0.16 | 0.16 | 0.18 | 
| 2 × 8 | 1.54 | 40.93 | 0.05 | 0.12 | 0.12 | 0.27 | 0.08 | 0.12 | 0.13 | 0.15 | 0.15 | 0.15 | 0.16 | 
| 4 × 4 | 282.99 | 46.15 | 0.06 | 0.18 | 0.18 | 7.69 | 0.08 | 0.16 | 0.17 | 0.19 | 0.19 | 0.20 | 0.22 | 
| 4 × 8 | 313.98 | 31.20 | 0.06 | 0.18 | 0.19 | 8.29 | 0.09 | 0.16 | 0.17 | 0.19 | 0.20 | 0.21 | 0.22 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qiu, W.; Chen, Y.; Chen, D.; Su, T.; Yang, S. Run-Time Hierarchical Management of Mapping, Per-Cluster DVFS and Per-Core DPM for Energy Optimization. Electronics 2022, 11, 1094. https://doi.org/10.3390/electronics11071094
Qiu W, Chen Y, Chen D, Su T, Yang S. Run-Time Hierarchical Management of Mapping, Per-Cluster DVFS and Per-Core DPM for Energy Optimization. Electronics. 2022; 11(7):1094. https://doi.org/10.3390/electronics11071094
Chicago/Turabian StyleQiu, Weiming, Yonghao Chen, Dihu Chen, Tao Su, and Simei Yang. 2022. "Run-Time Hierarchical Management of Mapping, Per-Cluster DVFS and Per-Core DPM for Energy Optimization" Electronics 11, no. 7: 1094. https://doi.org/10.3390/electronics11071094
APA StyleQiu, W., Chen, Y., Chen, D., Su, T., & Yang, S. (2022). Run-Time Hierarchical Management of Mapping, Per-Cluster DVFS and Per-Core DPM for Energy Optimization. Electronics, 11(7), 1094. https://doi.org/10.3390/electronics11071094
 
         
                                                


 
       