# The Software Cache Optimization-Based Method for Decreasing Energy Consumption of Computational Clusters

## Abstract

## 1. Introduction

related to the energy consumption measurement, the evaluation of software energy efficiency [26];
## 2. Basic Transformation

- Y = v
- W = −2 v
- for j = 2 to r
- z = −2(I + WY
- W = [W z]
- Y = [Y v
- end

^{(j)}—j-th reflection vector, and Y—matrix of reflection vectors.

## 3. Software Cache Optimization-Based Methodology

op and memory access operations U. During analysis:

the set of memory access operations is divided into disjoint subsets according to the method of accessing memory that affects the cache miss probability; subsets of operations with equal cache miss probability values Pcm are obtained;
- the number of element {N
_{α}}_{α}_{ɛA}s and the cache miss probability {P_{α}}_{α}_{ɛA}are determined for each of the obtained subsets; - -
the number of arithmetic and logical operations in general for the entire algorithm Nop is determined;
_{op}is determined; - -
- the expression is formed that evaluates the software running time T.

- analyzing the obtained subsets of memory access operations and the corresponding values of the cache miss probability {P
_{α}}_{α}_{ɛA}, the number of operations {N_{α}}_{α}_{ɛA}and the execution time {T_{α}}_{α}_{ɛA}of all subset operations, the search for algorithm sections that negatively affect the data caching process efficiency is carried out; - -
- an attempt is made to correct the original algorithm, the result is a new division of the memory access operations set into subsets with other characteristics {U’
_{β}}_{β}_{ɛB}; permutation optimization and other approaches can be used to correct the algorithm; - -
- software running time T’ with the modified algorithm is estimated;
- resulting speedup value S is calculated.

#### 3.1. Row-Oriented Householder Reflection

#### 3.2. Single-Pass Householder Reflection

**A**—the original matrix, N—dimension of the original matrix, Householder—the procedure for calculating the reflection vector,

**u**—reflection vector, g—normalization coefficient. SPH—procedure for performing one step of a single-pass Householder transformation (Single-Pass Householder). Matrices and vectors are highlighted in bold, and the ranges of index changes are indicated in parentheses.

- Hessenberg (
- for s = 1 to N − 2 do
- (g,
- SPH(A, N, s, g,
- end for
- end Hessenberg

- SPH(A, N, s, g,
- for r = 1 to (s−1) do
- vr
**A[r, s + 1:N]**-= vr_{s}**u**_{S}^{T}[s + 1:N]- 5:
**v**g_{s}^{T}[s:N] =**u**_{S}^{T}[s + 1:N] A[s + 1:N, s:N]- 6:
- for r = s to N do
**A[r, s:N]**-=**u**_{S}[r]**v**_{s}^{T}[s:N]- 8:
- vr
_{s}= g A[r, s + 1:N] u_{S}[s + 1:N] - 9:
- A[r, s + 1:N] -= vr
_{s}u_{S}^{T}[s + 1:N] - 10:
- end SPH

## 4. Results and Discussions

- PKG—the power consumed by the socket as a whole (Package);
- DRAM—the power consumed by DRAM;
- PKG Idle—the power consumed by the socket in «idle» mode;
- DRAM Idle—the power consumed by DRAM in «idle» mode.

_{1}and w

_{2}—“extra” power consumed by two software, and K = (PKG Idle + DRAM Idle)—the power consumed by the CPU in the “idle” mode. Then

## 5. Conclusions

## References

2022, 26, 821–836. [Google Scholar] [CrossRef]
Yang, R.; Song, J.; Huang, B.; Li, W.; Qi, G. An Energy-Efficient Step-Counting Algorithm for Smartphones. Comput. J. 2020, 65, 689–700. [Google Scholar] [CrossRef]
2013, 4, 444–449. [Google Scholar] [CrossRef]
2019, 78, 178–190. [Google Scholar] [CrossRef] [Green Version]
Bujanovic, Z.; Karlsson, L.; Kressner, D. A householder-based algorithm for Hessenberg-triangular reduction. SIAM J. Matrix Anal. Appl. 2018, 39, 1270–1294. [Google Scholar] [CrossRef] [Green Version]
2018, 271, 119–143. [Google Scholar]
Kabir, K.; Haidar, A.; Tomov, S.; Dongarra, J. Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures. Simul. Ser. 2015, 47, 135–142. [Google Scholar]
2010, 36, 645–654. [Google Scholar] [CrossRef]
Buttari, A.; Langou, J.; Kurzak, J.; Dongarra, J. Parallel tiled QR factorization for multicore architectures. Concurr. Comput. Pract. Exp. 2008, 20, 1573–1590. [Google Scholar] [CrossRef] [Green Version]
2018, 29, 1707–1720. [Google Scholar] [CrossRef] [Green Version]
Noble, J.H.; Lubasch, M.; Stevens, J.; Jentschura, U.D. Diagonalization of complex symmetric matrices: Generalized Householder reflections, iterative deflation and implicit shifts. Comput. Phys. Commun. 2017, 221, 304–316. [Google Scholar] [CrossRef]
2016, 43, 1–18. [Google Scholar] [CrossRef] [Green Version]
Schreiber, R.; VanLoan, C. A Storage-Efficient WY Representation for Products of Householder Transformations. SIAM J. Sci. Stat. Comput. 1989, 10, 53–57. [Google Scholar] [CrossRef] [Green Version]
2022, 4, 100–103. [Google Scholar] [CrossRef]
2020, 2, 71–85. [Google Scholar]
**Figure 1.**The main directions of increasing the software efficiency for target computing architectures.

**Figure 3.**Obtaining a row-oriented scheme for performing a basic transformation: (

**a**) traditional algorithm based on the Householder reflection; (

**b**) row-oriented transformation scheme.

**Figure 4.**Obtaining a single-pass scheme for performing a basic transformation: (

**a**) row-oriented transformation scheme; (

**b**) single-pass transformation scheme.

**Figure 7.**The dependence of the power consumed in the software execution process on the problem dimension (classical scheme).

**Figure 8.**The dependence of the power consumed in the software execution process on the problem dimension (single-pass scheme).

