Time–Energy Correlation for Multithreaded Matrix Factorizations
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Metrics
3.2. Algorithms
3.3. Methodology
- cpupower frequency-set -d 1700000
- cpupower frequency-set -u 1700000
4. Results
4.1. Time and Energy Consumption
4.2. Speedup and Greenup
4.3. Energy and Runtime Correlation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ngoko, Y.; Trystram, D. Scalability in Parallel Processing. In Topics in Parallel and Distributed Computing, Enhancing the Undergraduate Curriculum: Performance, Concurrency, and Programming on Modern Platforms; Prasad, S.K., Gupta, A., Rosenberg, A.L., Sussman, A., Weems, C.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 79–109. [Google Scholar] [CrossRef]
- Dongarra, J.; DuCroz, J.; Duff, I.S.; Hammarling, S. A Set of Level-3 Basic Linear Algebra Subprograms. ACM Trans. Math. Softw. 1990, 16, 1–28. [Google Scholar] [CrossRef]
- Intel Math Kernel Library 2014. Available online: http://software.intel.com/en-us/articles/intel-mkl/ (accessed on 1 June 2023).
- Weiser, M.; Welch, B.; Demers, A.; Shenker, S. Scheduling for reduced CPU energy. In Mobile Computing; The Kluwer International Series in Engineering and Computer Science; Imielinski, T., Korth, H.F., Eds.; Springer: Boston, MA, USA, 1994; Volume 353, pp. 13–23. [Google Scholar]
- Garcia, E.; Arteaga, J.; Pavel, R.; Gao, G.R. Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture. In Proceedings of the Languages and Compilers for Parallel Computing; Cașcaval, C., Montesinos, P., Eds.; Springer: Cham, Switzerland, 2014; pp. 237–251. [Google Scholar]
- Whaley, R.C.; Petitet, A.; Dongarra, J.J. Automated empirical optimizations of software and the ATLAS project. Parallel Comput. 2001, 27, 3–35. [Google Scholar] [CrossRef]
- Jakobs, T.; Lang, J.; Rünger, G.; Stöcker, P. Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library. Future Gener. Comput. Syst. 2018, 82, 555–564. [Google Scholar] [CrossRef]
- Rauber, T.; Rünger, G.; Stachowski, M. Model-based optimization of the energy efficiency of multi-threaded applications. Sustain. Comput. Inform. Syst. 2019, 22, 44–61. [Google Scholar] [CrossRef]
- Carretero, J.; Distefano, S.; Petcu, D.; Pop, D.; Rauber, T.; Rünger, G.; Singh, D.E. Energy-efficient Algorithms for Ultrascale Systems. Supercomput. Front. Innov. 2015, 2, 77–104. [Google Scholar] [CrossRef]
- Bratek, R.; Szustak, L.; Wyrzykowski, R.; Olas, T. Reducing energy consumption using heterogeneous voltage frequency scaling of data-parallel applications for multicore systems. J. Parallel Distrib. Comput. 2023, 175, 121–133. [Google Scholar] [CrossRef]
- Gonzalez, R.; Horowitz, M. Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circuits 1996, 31, 1277–1284. [Google Scholar] [CrossRef]
- Abdulsalam, S.; Zong, Z.; Gu, Q.; Meikang, Q. Using the Greenup, Powerup, and Speedup metrics to evaluate software energy efficiency. In Proceedings of the 2015 Sixth International Green and Sustainable Computing Conference (IGSC), Las Vegas, NV, USA, 14–16 December 2015; pp. 1–8. [Google Scholar] [CrossRef]
- Trefethen, L.N.; Bau, D. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 2022; Lecture 21. [Google Scholar]
- Anderson, E.; Bai, Z.; Bischof, C.; Demmel, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A.; Sorensen, D. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar] [CrossRef]
- Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar] [CrossRef]
- Khan, K.; Hirki, M.; Niemi, T.; Nurminen, J.; Ou, Z. RAPL in Action: Experiences in Using RAPL for Power Measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. (TOMPECS) 2018, 3, 1–26. [Google Scholar] [CrossRef]
- Hajiamini, S.; Shirazi, B.A. Chapter Two—A Study of DVFS Methodologies for Multicore Systems with Islanding Feature; Elsevier: Amsterdam, The Netherlands, 2020; Volume 119, pp. 35–71. [Google Scholar] [CrossRef]
- Hoveida, M.; Aghaaliakbari, F.; Jalili, M.; Bashizade, R.; Arjomand, M.; Sarbazi-Azad, H. Chapter Two—Revisiting Processor Allocation and Application Mapping in Future CMPs in Dark Silicon Era. In Dark Silicon and Future On-Chip Systems; Hurson, A.R., Sarbazi-Azad, H., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; Volume 110, pp. 35–81. [Google Scholar] [CrossRef]
- Murray, J.; Wettin, P.; Pande, P.P.; Shirazi, B. Chapter 10—Conclusions and Possible Future Explorations. In Sustainable Wireless Network-on-Chip Architectures; Murray, J., Wettin, P., Pande, P.P., Shirazi, B., Eds.; Morgan Kaufmann: Boston, MA, USA, 2016; pp. 143–155. [Google Scholar] [CrossRef]
Version of Algorithm | Energy Consumption (J) | Time (s) | Waste of Time (%) | Energy Saving (%) |
---|---|---|---|---|
LU without pivoting 40 threads|1.7 GHz | 6169.10 | 26.53 | 7.9 | 1.6 |
LU with pivoting 40 threads|2.1 GHz | 6306.62 | 24.31 | 0 | 0 |
Cholesky 30 threads|2.0 GHz | 3375.74 | 14.67 | 10.1 | 1.3 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
1/2.1 | 716.89 | 71,034.97 | 1.0000 | 1.0000 | 1.0000 | Base | 50,924,083.5731 | 1.0000 |
10/2.1 | 73.81 | 10,927.71 | 9.7128 | 6.5004 | 1.4942 | 3 | 806,557.6063 | 0.0158 |
20/2.1 | 37.91 | 7468.31 | 18.9104 | 9.5115 | 1.9882 | 3 | 283,120.9561 | 0.0056 |
30/2.1 | 27.92 | 6560.54 | 25.6745 | 10.8276 | 2.3712 | 3 | 183,184.3908 | 0.0036 |
40/2.1 | 24.43 | 6267.94 | 29.3395 | 11.3331 | 2.5888 | 3 | 153,152.2553 | 0.0030 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
1/2.1 | 375.37 | 37,661.12 | 1.0000 | 1.0000 | 1.0000 | Base | 14,136,853.0153 | 1.0000 |
10/2.1 | 37.98 | 5601.75 | 9.8833 | 6.7231 | 1.4701 | 3 | 212,755.7039 | 0.0150 |
20/2.1 | 20.23 | 3942.98 | 18.5530 | 9.5514 | 1.9424 | 3 | 79,775.3197 | 0.0056 |
30/2.1 | 14.24 | 3377.33 | 26.3548 | 11.1512 | 2.3634 | 3 | 48,103.1362 | 0.0034 |
40/2.1 | 13.18 | 3418.30 | 28.4864 | 11.0175 | 2.5856 | 3 | 45,043.5085 | 0.0032 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
40/2.1 | 24.43 | 6267.94 | 1.0000 | 1.0000 | 1.0000 | Base | 153,152.2553 | 1.0000 |
40/2.0 | 25.07 | 6264.05 | 0.9748 | 1.0006 | 0.9742 | 4 | 157,012.9875 | 1.0252 |
40/1.9 | 25.87 | 6293.69 | 0.9446 | 0.9959 | 0.9485 | 6 | 162,797.3705 | 1.0630 |
40/1.8 | 26.53 | 6270.34 | 0.9210 | 0.9996 | 0.9213 | 6 | 166,353.3561 | 1.0862 |
40/1.7 | 27.39 | 6217.28 | 0.8922 | 1.0081 | 0.8850 | 4 | 170,271.6647 | 1.1118 |
40/1.6 | 28.96 | 6318.91 | 0.8437 | 0.9919 | 0.8505 | 6 | 183,010.9802 | 1.1950 |
40/1.4 | 31.90 | 6401.94 | 0.7661 | 0.9791 | 0.7825 | 6 | 204,190.6201 | 1.3333 |
40/1.1 | 39.40 | 7007.56 | 0.6202 | 0.8945 | 0.6934 | 6 | 276,088.4306 | 1.8027 |
40/0.8 | 53.01 | 8054.57 | 0.4609 | 0.7782 | 0.5923 | 6 | 427,002.3607 | 2.7881 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
1/2.1 | 724.78 | 71,914.57 | 1 | 1 | 1 | Base | 52,122,410.6035 | 1.0000 |
1/2.0 | 767.24 | 50,920.53 | 0.9447 | 1.4123 | 0.6689 | 4 | 39,068,253.5085 | 0.7495 |
1/1.9 | 806.09 | 52,842.73 | 0.8991 | 1.3609 | 0.6607 | 4 | 42,596,156.3818 | 0.8172 |
1/1.8 | 849.50 | 55,128.00 | 0.8532 | 1.3045 | 0.6540 | 4 | 46,831,489.9000 | 0.8985 |
1/1.7 | 897.89 | 58,064.77 | 0.8072 | 1.2385 | 0.6517 | 4 | 52,135,892.0472 | 1.0003 |
1/1.6 | 953.14 | 61,609.11 | 0.7604 | 1.1673 | 0.6514 | 4 | 58,721,816.5918 | 1.1266 |
1/1.4 | 1085.91 | 69,981.08 | 0.6674 | 1.0276 | 0.6495 | 4 | 75,992,808.7077 | 1.4580 |
1/1.1 | 1378.09 | 86,557.87 | 0.5259 | 0.8308 | 0.6330 | 6 | 119,284,879.5255 | 2.2886 |
1/0.8 | 1891.50 | 118,098.32 | 0.3832 | 0.6089 | 0.6293 | 6 | 223,382,614.8920 | 4.2857 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
10/2.1 | 94.20 | 13,510.11 | 1 | 1 | 1 | Base | 12,726,269.687 | 1.0000 |
10/2.0 | 99.15 | 12,126.88 | 0.9500 | 1.1141 | 0.8528 | 4 | 1,202,399.7200 | 0.9448 |
10/1.9 | 104.85 | 12,437.41 | 0.8984 | 1.0862 | 0.8270 | 4 | 1,304,115.9838 | 1.0247 |
10/1.8 | 109.65 | 12,869.10 | 0.8590 | 1.0498 | 0.8183 | 4 | 1,411,154.5934 | 1.1089 |
10/1.7 | 115.81 | 13,174.41 | 0.8134 | 1.0255 | 0.7932 | 4 | 1,525,707.1415 | 1.1989 |
10/1.6 | 122.62 | 13,638.36 | 0.7682 | 0.9906 | 0.7755 | 6 | 1,672,284.4035 | 1.3140 |
10/1.4 | 139.28 | 14,497.22 | 0.6763 | 0.9319 | 0.7257 | 6 | 2,019,231.5586 | 1.5867 |
10/1.1 | 177.64 | 16,659.85 | 0.5303 | 0.8109 | 0.6539 | 6 | 2,959,463.0973 | 2.3255 |
10/0.8 | 241.33 | 19,580.23 | 0.3903 | 0.6900 | 0.5657 | 6 | 4,725,256.6406 | 3.7130 |
Threads/Frequency | Time [s] | Energy [J] | Speedup | Greenup | Powerup | Category | EDP | Normalized EDP |
---|---|---|---|---|---|---|---|---|
40/2.1 | 13.18 | 3418.30 | 1.0000 | 1.0000 | 1.0000 | Base | 45,043.5085 | 1.0000 |
40/2.0 | 13.26 | 3412.81 | 0.9940 | 1.0016 | 0.9924 | 4 | 45,241.6245 | 1.0044 |
40/1.9 | 13.66 | 3425.57 | 0.9644 | 0.9979 | 0.9665 | 6 | 46,803.7656 | 1.0391 |
40/1.8 | 13.90 | 3404.33 | 0.9477 | 1.0041 | 0.9439 | 4 | 47,332.7571 | 1.0508 |
40/1.7 | 14.47 | 3423.69 | 0.9105 | 0.9984 | 0.9119 | 6 | 49,548.7595 | 1.1000 |
40/1.6 | 15.18 | 3447.15 | 0.8679 | 0.9916 | 0.8752 | 6 | 52,336.2409 | 1.1619 |
40/1.4 | 16.97 | 3501.77 | 0.7766 | 0.9762 | 0.7956 | 6 | 59,414.3823 | 1.3190 |
40/1.1 | 20.17 | 3764.29 | 0.6532 | 0.9081 | 0.7193 | 6 | 75,936.2663 | 1.6858 |
40/0.8 | 26.82 | 4365.30 | 0.4912 | 0.7831 | 0.6273 | 6 | 117,093.8704 | 2.5996 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bylina, B.; Piekarz, M. Time–Energy Correlation for Multithreaded Matrix Factorizations. Energies 2023, 16, 6290. https://doi.org/10.3390/en16176290
Bylina B, Piekarz M. Time–Energy Correlation for Multithreaded Matrix Factorizations. Energies. 2023; 16(17):6290. https://doi.org/10.3390/en16176290
Chicago/Turabian StyleBylina, Beata, and Monika Piekarz. 2023. "Time–Energy Correlation for Multithreaded Matrix Factorizations" Energies 16, no. 17: 6290. https://doi.org/10.3390/en16176290
APA StyleBylina, B., & Piekarz, M. (2023). Time–Energy Correlation for Multithreaded Matrix Factorizations. Energies, 16(17), 6290. https://doi.org/10.3390/en16176290