# A Comparative Study of Methods for Measurement of Energy of Computing

^{*}

## Abstract

**:**

## 1. Introduction

- The first comprehensive comparative study of the accuracy of state-of-the-art on-chip power sensors and energy predictive models against system-level physical measurements using external power meters, which we consider to be the ground truth.
- A comparison of the accuracy of state-of-the-art on-chip power sensors against the ground truth employing two scientific applications, matrix-matrix multiplication and 2D fast Fourier transform, executed on three modern Intel multicore CPUs (one Haswell and two Skylake), two Nvidia GPUs (Tesla K40 and Tesla P100 PCIe) and one Intel Xeon Phi accelerator. A comparison of the accuracy of state-of-the-art energy predictive models employing PMCs as predictor variables with the ground truth using a diverse set of seventeen benchmarks executed on two modern Intel multicore Skylake CPUs.
- We demonstrate significant losses of energy by employing inaccurate energy measurements provided by on-chip sensors in optimization of applications for dynamic energy.
- We show that, owing to the nature of the deviations of the energy measurements provided by on-chip sensors from the ground truth, calibration can not improve the accuracy of the on-chip sensors to an extent that can favour their use in optimization of applications for dynamic energy.

## 2. Terminology and Motivation

## 3. Related Work

#### 3.1. On-Chip Power Sensors

#### 3.2. Software Based Energy Predictive Models

#### Energy Predictive Models for Accelerators

## 4. Experimental Setup for Comparing On-Chip Sensors and System-Level Physical Measurements Using Power Meters

## 5. Methodology to Determine the Component-Level Energy Consumption Using HCLWattsUp

- We ensure the platform is reserved exclusively and fully dedicated to our experiments.
- We monitor the disk consumption before and during the application run and ensure that there is no I/O performed by the application using tools such as sar, iotop, and so forth.
- We ensure that the problem size used in the execution of an application does not exceed the main memory and that swapping (paging) does not occur.
- We ensure that network is not used by the application by monitoring using tools such as sar, atop, etc.
- We set the application kernel’s CPU affinity mask using SCHED API’s system call SCHED_SETAFFINITY() Consider for example mkl-DGEMM application kernel running on only abstract processor A. To bind this application kernel, we set its CPU affinity mask to 12 physical CPU cores of Socket 1 and 12 physical CPU cores of Socket 2.
- Fans are also a great contributor to energy consumption. On our platform fans are controlled in two zones: (a) zone 0: CPU or System fans, (b) zone 1: Peripheral zone fans. There are 4 levels to control the speed of fans:
- Standard: BMC control of both fan zones, with CPU zone based on CPU temp (target speed 50%) and Peripheral zone based on PCH temp (target speed 50%)
- Optimal: BMC control of the CPU zone (target speed 30%), with Peripheral zone fixed at low speed (fixed 30%)
- Heavy IO: BMC control of CPU zone (target speed 50%), Peripheral zone fixed at 75%
- Full: all fans running at 100%

In all speed levels except the full, the speed is subject to be changed with temperature and consequently their energy consumption also changes with the change of their speed. Higher the temperature of CPU, for example, higher the fans speed of zone 0 and higher the energy consumption to cool down. This energy consumption to cool the server down, therefore, is not consistent and is dependent on the fans speed and consequently can affect the dynamic energy consumption of the given application kernel.Hence, to rule out the fans’ contribution in dynamic energy consumption, we set the fans at full speed before launching the experiments. When set at full speed, the fans run consistently at a fixed speed until we do so to another speed level. Hence, fans consume same amount of power which is included in static power of the platform. - We monitor the temperature of the platform and speed of the fans (after setting it at full) with help of Intelligent Platform Management Interface (IPMI) sensors, both with and without the application run. We find no considerable difference in temperature and find the speed of fans the same in both scenarios.

## 6. Comparison of Measurements Using RAPL and HCLWattsUp

- PP0 (Core Devices): Power plane zero includes the energy consumption by all the CPU cores in the socket(s).
- PP1 (Uncore Devices): Power plane one includes the power consumption of integrated graphics processing unit – which is not available on server platforms– uncore components.
- DRAM: Refers to the energy consumption of the main memory.
- Package: Refers to the energy consumption of entire socket including core and uncore:
`Package = PP0 + PP1`.

#### 6.1. Methodology

- Using Intel PCM/PAPI, we obtain the base power of CPUs (core and un-core) and DRAM (when the given application is not running).
- Using HCLWattsUp API, we obtain the execution time of the given application.
- Using Intel PCM/PAPI, we obtain the total energy consumption of the CPUs and DRAM, during the execution of the given application.
- Finally, we calculate the dynamic energy consumption (of CPUs and DRAM) by subtracting the base energy from total energy consumed during the execution of the given application.

- Using HCLWattsUp API, we obtain the base power of the server (when the given application is not running).
- Using HCLWattsUp API, we obtain the execution time of the application.
- Using HCLWattsUp API, we obtain the total energy consumption of the server, during the execution of the given application.
- Finally, we calculate the dynamic energy consumption by subtracting the base power from total energy consumed during the execution of the given application.

#### 6.2. Experimental Results on HCLServer03

#### 6.3. Discussion

## 7. Comparison of Measurements by GPU and Xeon Phi Sensors with HCLWattsUp

#### 7.1. Experimental Results Using GPU Sensors (NVML)

#### 7.2. Experimental Results Using Intel Xeon Phi Sensors (Intel MPSS)

#### 7.3. Discussion

## 8. Comparison of Dynamic Energy Consumption Using PMC-Based Energy Predictive Models and HCLWattsUp

- Techniques that consider all the PMCs offered for a computing platform with the goal to capture all possible contributors to energy consumption. To the best of our knowledge, we found no research works that adopt this approach because of the models’ complexities.
- Techniques that use expert advice or intuition to pick a subset of PMCs and that, in experts’ opinion, are dominant contributors to energy consumption [34].
- Techniques that select parameters with physical significance based on fundamental laws such as the energy conservation of computing [18]. Shahid et al. [18] introduced a new property of PMCs that is based on an experimental observation that dynamic energy consumption of serial execution of two applications is equal to the sum of the dynamic energy consumption of those applications when they are run separately. The property is based on a simple and intuitive rule that if the PMC is intended for a linear predictive model, the value of it for a serial execution of two applications should be equal to the sum of its values obtained for the individual execution of each application. The PMC is branded as non-additive on a platform if there exists an application for which the calculated value differs significantly from the value observed for the application execution on the platform. The use of non-additive PMCs in a model impairs its prediction accuracy.

#### 8.1. Experimental Setup

- Class A: In this class, we study the accuracy of platform-level linear regression models using a diverse set of applications.
- Class B: In this class, we study the accuracy of application-specific linear regression models.

#### 8.2. Accuracy of Platform-Level Linear PMC-Based Models

- IDQ_MITE_UOPS (${X}_{1}$)
- IDQ_MS_UOPS (${X}_{2}$)
- ICACHE_64B_IFTAG_MISS (${X}_{3}$)
- ARITH_DIVIDER_COUNT (${X}_{4}$)
- L2_RQSTS_MISS (${X}_{5}$)
- FP_ARITH_INST_RETIRED_DOUBLE (${X}_{6}$)

- All the models have a significant intercept (${\beta}_{0}$) Therefore, the model would give predictions for dynamic energy based on the intercept values even for the case when there is no application executing on the platform, which is erroneous. We consider this to be a serious drawback of existing linear energy predictive models (given in Section 3), which do not understand the physical significance of the parameters with dynamic energy consumption.
- Model A has negative coefficients ($\beta =\{{\beta}_{1},\dots ,{\beta}_{6}\}$) for PMCs, ${X}_{4}$ and ${X}_{6}$. Similarly, Model B have negative coefficients for PMC ${X}_{4}$ and ${X}_{6}$. and in Models C-E, ${X}_{6}$ has negative coefficient. The negative coefficients in these models can give rise to negative energy consumption predictions for specific applications where the counts for ${X}_{4}$ and ${X}_{6}$ are relatively higher than the other PMCs.

#### 8.3. Accuracy of Application-Specific PMC-Based Models

## 9. Energy Losses From Employing an Inaccurate Measurement Tool

**1. Plane intersection of dynamic energy functions:**Dynamic energy consumption functions $\{{E}_{1},{E}_{2}\}$ are cut by the plane $y=N$ producing two curves that represent the dynamic energy consumption functions against x given y is equal to N.

**2. Determine M and K:**

## 10. Current Picture, Recommendations and Future Directions

## 11. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Three Popular Approaches to Measure the Dynamic Energy Consumption

## Appendix B. Rationale Behind Using Dynamic Energy Consumption Instead of Total Energy Consumption

- Static energy consumption is a constant (or a inherent property) of a platform that can not be optimized. It does not depend on the application configuration.
- Although static energy consumption is a major concern in embedded systems, it is becoming less compared to the dynamic energy consumption due to advancements in hardware architecture design in HPC systems.
- We target applications and platforms where dynamic energy consumption is the dominating energy dissipator.
- Finally, we believe its inclusion can underestimate the true worth of an optimization technique that minimizes the dynamic energy consumption. We elucidate using two examples from published results.
- In our first example, consider a model that reports predicted and measured total energy consumption of a system to be 16,500 J and 18,000 J. It would report the prediction error to be 8.3%. If it is known that the static energy consumption of the system is 9000 J, then the actual prediction error (based on dynamic energy consumption only) would be 16.6% instead.
- In our second example, consider two different energy prediction models (${M}_{A}$ and ${M}_{B}$) with same prediction errors of 5% for an application execution on two different machines (A and B) with same total energy consumption of 10,000 J. One would consider both the models to be equally accurate. But supposing it is known that the dynamic energy proportions for the machines are 30% and 60%. Now, the true prediction errors (using dynamic energy consumption only) for the models would be 16.6% and 8.3%. Therefore, the second model ${M}_{B}$ should be considered more accurate than the first.

## Appendix C. Application Programming Interface (API) for Measurements Using External Power Meter Interfaces (HCLWattsUp)

- TIME—The execution time (seconds).
- DPOWER—The average dynamic power (watts).
- TENERGY—The total energy consumption (joules).
- DENERGY—The dynamic energy consumption (joules).

**Figure A1.**Example illustrating the use of HCLWattsUp API for measuring the dynamic energy consumption.

- The maximum number of repetitions specified in $maxRepeats$ is exceeded.
- The sample mean is within $maxStdError$ percent of the confidence interval $cl$. The confidence interval of the mean is estimated using Student’s t-distribution.
- The maximum allowed time $maxElapsedTime$ specified in seconds has elapsed.

## Appendix D. Methodology to Obtain a Reliable Data Point

- The server is fully reserved and dedicated to these experiments during their execution. We also made certain that there are no drastic fluctuations in the load due to abnormal events in the server by monitoring its load continuously for a week using the tool sar. Insignificant variation in the load was observed during this monitoring period suggesting normal and clean behaviour of the server.
- We set the application kernel’s CPU affinity mask using SCHED API’s system call SCHED_SETAFFINITY() Consider for example mkl-DGEMM application kernel running on HCLServer01. To bind this application kernel, we set its CPU affinity mask to 12 physical CPU cores of Socket 1 and 12 physical CPU cores of Socket 2.
- To make sure that pipelining, cache effects and so forth, do not happen, the experiments are not executed in a loop and sufficient time (120 s) is allowed to elapse between successive runs. This time is based on observations of the times taken for the memory utilization to revert to base utilization and processor (core) frequencies to come back to the base frequencies.
- To obtain a data point, the application is repeatedly executed until the sample mean lies in the 95% confidence interval and a precision of 0.025 (2.5%) has been achieved. For this purpose, Student’s t-test is used assuming that the individual observations are independent and their population follows the normal distribution. We verify the validity of these assumptions by plotting the distributions of observations.The function $MeanUsingTtest$, shown in Algorithm 1, describes this step. For each data point, the function is invoked, which repeatedly executes the application $app$ until one of the following three conditions is satisfied:
- The maximum number of repetitions ($maxReps$) have been exceeded (Line 3).
- The sample mean falls in the confidence interval (or the precision of measurement $eps$ has been achieved) (Lines 15–17).
- The elapsed time of the repetitions of application execution has exceeded the maximum time allowed ($maxT$ in seconds) (Lines 18–20).

So, for each data point, the function $MeanUsingTtest$ is invoked and the sample mean $mean$ is returned at the end of invocation. The function $Measure$ measures the execution time or the dynamic energy consumption using the HCL’s WattsUp library [19] based on the input, $TIME$ or $ENERGY$. The input minimum and maximum number of repetitions, $minReps$ and $maxReps$, differ based on the problem size solved. For small problem sizes ($32\le n\le 1024$), these values are set to 10,000 and 100,000 respectively. For medium problem sizes ($1024<n\le 5120$), these values are set to 100 and 1000. For large problem sizes ($n>5120$), these values are set to 5 and 50. The values of $maxT$, $cl$ and $eps$ are respectively set to 3600, 0.95 and 0.025. If the precision of measurement is not achieved before the maximum number of repeats have been completed, we increase the number of repetitions and also the maximum elapsed time allowed. However, we observed that condition (2) is always satisfied before the other two in our experiments.

Algorithm 1 Function determining the sample mean using Student’s t-test. |

1: procedure MeanUsingTtest($app,minReps,maxReps,$ $maxT,cl,accuracy,$ $repsOut,clOut,etimeOut,epsOut,mean$) Input:The application to execute, $app$ The minimum number of repetitions, $minReps\in {\mathbb{Z}}_{>0}$ The maximum number of repetitions, $maxReps\in {\mathbb{Z}}_{>0}$ The maximum time allowed for the application to run, $maxT\in {\mathbb{R}}_{>0}$ The required confidence level, $cl\in {\mathbb{R}}_{>0}$ The required accuracy, $eps\in {\mathbb{R}}_{>0}$ Output:The number of experimental runs actually made, $repsOut\in {\mathbb{Z}}_{>0}$ The confidence level achieved, $clOut\in {\mathbb{R}}_{>0}$ The accuracy achieved, $epsOut\in {\mathbb{R}}_{>0}$ The elapsed time, $etimeOut\in {\mathbb{R}}_{>0}$ The mean, $mean\in {\mathbb{R}}_{>0}$ 2: $reps\leftarrow 0$; $stop\leftarrow 0$; $sum\leftarrow 0$; $etime\leftarrow 0$ 3: while ($reps<maxReps$) and ($!stop$) do4: $st\leftarrow $measure$\left(TIME\right)$ 5: execute$\left(app\right)$ 6: $et\leftarrow $measure$\left(TIME\right)$ 7: $reps\leftarrow reps+1$ 8: $etime\leftarrow etime+et-st$ 9: $ObjArray\left[reps\right]\leftarrow et-st$ 10: $sum\leftarrow sum+ObjArray\left[reps\right]$ 11: if $reps>minReps$ then12: $clOut$← fabs(gsl_cdf_tdist_Pinv($cl$, $reps-1$)) × gsl_stats_sd($ObjArray$, 1, $reps$) / sqrt($reps$) 13: if $clOut\times \frac{reps}{sum}<eps$ then14: $stop\leftarrow 1$ 15: end if16: if $etime>maxT$ then17: $stop\leftarrow 1$ 18: end if19: end if20: end while21: $repsOut\leftarrow reps$; $epsOut\leftarrow clOut\times \frac{reps}{sum}$ 22: $etimeOut\leftarrow etime$; $mean\leftarrow \frac{sum}{reps}$ 23: end procedure |

**Figure A2.**Dynamic energy profiles by RAPL and HCLWattsUp on HCLServer03 falling into Class B. G = Threadgroups and T = Threads.

## Appendix E. Comparison of RAPL and HCLWattsUp on HCLServer03

**Table A1.**Percentage error of dynamic energy consumption with RAPL and HCLWattsUp on HCLServer03. G = Threadgroups and T = Threads.

Application | Problem Size, Step-Size | Configuration Parameter | Avg Actual Error | Avg. Error after Calibration | Reduction after Calibration |
---|---|---|---|---|---|

FFTW | N = $32,768$ | CPU Threads (1–112) | 12.68% | 3.69% | 70.9% |

MKL-FFT | N = $43,328$ | CPU Cores (1–56) | 13.05% | 2.19% | 83.22% |

FFTW | N = 20,480–21560, SS = 512 | problem size ($M\times N$) where $0\ge M\le N/2$ | 8.15% | 5.56% | 31.78 |

FFTW | N = $32,768$, SS = 16 | Load Imbalance: problem size ($M\times N$) where $0\ge M\le N/2$ | 10.45% | 0.6% | 94.26% |

**Table A2.**Percentage error of dynamic energy consumption with RAPL and HCLWattsUp on HCLServer03. G = Threadgroups and T = Threads.

Application | Problem Size, Step-Size | Configuration Parameter | Avg Actual Error | Avg. Error after Calibration | Reduction after Calibration |
---|---|---|---|---|---|

OpenBlas DGEMM | N = 10,240–25,600, SS = 512 | CPU Threads | |||

G = 56, T = 2 | 12.84% | 6.66% | 48.13% | ||

G = 28, T = 4 | 13.28% | 8.58% | 35.39% | ||

G = 16, T = 7 | 14.02% | 8.54% | 39.09% | ||

G = 14, T = 8 | 13.61% | 7.98% | 41.37% | ||

G = 8, T = 14 | 18.59% | 9.64% | 48.14% | ||

G = 7, T = 16 | 19% | 9.7% | 48.95% | ||

G = 4, T = 28 | 20.89% | 10.38% | 50.31% | ||

G = 2, T = 56 | 23.41% | 11.21% | 52.11% | ||

MKL-FFT | N = 32,768–43,456, SS = 64 | problem size | |||

G = 28, T = 2 | 15.08% | 4.91% | 67.4% | ||

G = 14, T = 4 | 13.63% | 4.97% | 63.54% | ||

G = 8, T = 7 | 13.24% | 5.25% | 60.35 | ||

G = 7, T = 8 | 13.21% | 5.4% | 59.12% | ||

G = 4, T = 14 | 13.03% | 5.65% | 56.64% | ||

G = 2, T = 28 | 13.02% | 5.64% | 56.68% | ||

G = 1, T = 56 | 14.12% | 6.22% | 55.95% | ||

MKL-FFT | N = 25,600–46,080, SS = 512 | problem size | |||

G = 28, T = 2 | 14.46% | 5% | 65.42% | ||

G = 14, T = 4 | 13% | 4.51% | 65.31% | ||

G = 8, T = 7 | 12.4% | 4.49% | 63.79% | ||

G = 7, T = 8 | 12.34% | 4.45% | 63.94% | ||

G = 4, T = 14 | 11.97% | 4.58% | 61.74% | ||

G = 2, T = 28 | 12.35% | 4.8% | 61.13% | ||

G = 1, T = 56 | 13.56% | 6.27% | 53.76% | ||

FFTW | N = 35,480–41,920, SS = 64 | problem size | |||

G = 16, T = 7 | 12.4% | 10.35% | 16.53% | ||

G = 14, T = 8 | 13.19% | 11.54% | 12.51% | ||

G = 8, T = 14 | 13.66% | 12.73% | 6.81%% | ||

G = 7, T = 16 | 14.59% | 13.3% | 8.84% | ||

G = 4, T = 28 | 13.73% | 12.78% | 6.92% | ||

G = 2, T = 56 | 12.3% | 5.58% | 54.63% | ||

G = 1, T = 112 | 24.62% | 3.9% | 84.16% |

**Table A3.**Percentage error of dynamic energy consumption with RAPL and HCLWattsUp on HCLServer03. ’-’ denotes that calibration does not be improve the difference.

Application | Problem Size, Step-Size | Configuration Parameter | Avg Actual Error | Avg. Error after Calibration | Reduction after Calibration |
---|---|---|---|---|---|

FFTW | N = 30,720–34,816, SS = 64 | problem size | |||

G = 16, T = 7 | 14.51% | - | - | ||

G = 14, T = 8 | 16.32% | - | - | ||

G = 8, T = 14 | 16.15% | - | - | ||

G = 7, T = 16 | 14.89% | - | - | ||

G = 4, T = 28 | 9.32% | - | - | ||

G = 2, T = 56 | 10.94% | 5.34% | 51.19% | ||

G = 1, T = 112 | 25.05% | 10.44% | 58.32% | ||

FFTW | N = 20,480–26,560, SS = 64 | problem size | |||

G = 16, T = 7 | 31% | - | - | ||

G = 14, T = 8 | 28.16% | - | - | ||

G = 8, T = 14 | 21.59% | - | - | ||

G = 7, T = 16 | 17.76% | - | - | ||

G = 4, T = 28 | 7.6% | 4.83 | 36.45% | ||

G = 2, T = 56 | 9.76% | 6.12% | 37.3% | ||

G = 1, T = 112 | 25.63% | 10.22% | 60.12 |

## Appendix F. Experimental Results of RAPL and HCLWattsUp on HCLServer01 and HCLServer02

**Table A4.**Percentage error of dynamic energy consumption with RAPL and HCLWattsUp on HCLServer01 and HCLServer02. ’-’ denotes that calibration does not improve the difference.

Application | Platform | Avg | Max | Min | Avg after Calibration | Reduction after Calibration |
---|---|---|---|---|---|---|

FFT | HCLServer01 | 16.01% | 37.1% | 0.01% | 3.48% | 78.26% |

DGEMM | HCLServer01 | 62.42% | 266.42% | 12.54% | 42.86% | 31.34% |

FFT | HCLServer02 | 28.67% | 156.38% | 0.03% | - | - |

DGEMM | HCLServer02 | 36.13% | 205% | 0.39% | - | - |

## Appendix G. Methodology To Compare Measurements Using Sensors and HCLWattsUp

- Using Intel PCM/PAPI, we obtain the base power of CPU and DRAM (when the given application is not running).
- Using HCLWattsUp API, we obtain the execution time of the given application.
- Using Intel PCM/PAPI, we obtain the total energy consumption of the CPU host-core (because all other cores are idle) and DRAM, during the execution of the given application.
- Finally, we calculate the dynamic energy consumption (of CPU and DRAM) by subtracting the base energy from total energy consumed during the execution of the given application.

- Using NVML/Intel SMC, we obtain the base power of GPU/Xeon Phi (when the given application is not running).
- Using HCLWattsUp API, we obtain the execution time of the given application.
- Using NVML/Intel SMC, we obtain the total energy consumption of GPU/Xeon Phi during the execution of the given application.
- Finally, we calculate the dynamic energy consumption GPU/Xeon Phi by subtracting the base energy from total energy consumed during the execution of the given application.

- Using HCLWattsUp API, we obtain the base power of the server (when the given application is not running).
- Using HCLWattsUp API, we obtain the execution time of the application.
- Using HCLWattsUp API, we obtain the total energy consumption of the server, during the execution of the given application.
- Finally, we calculate the dynamic energy consumption by subtracting the base power from total energy consumed during the execution of the given application.

## Appendix H. Comparison of Measurements by GPU Sensors with HCLWattsUp on HCLServer02

**Table A5.**Percentage error of dynamic energy consumption by Nvidia P100 PCIe GPU with and without calibration and HCLWattsUp on HCLServer02.

Without Calibration | |||
---|---|---|---|

Application | Min | Max | Avg |

DGEMM | 13.11% | 84.84% | 40.06% |

FFT | 17.91% | 175.97% | 73.34% |

With Calibration | |||

Application | Min | Max | Avg |

DGEMM | 0.07% | 26.07% | 11.62% |

FFT | 0.025% | 51.24% | 16.95% |

## Appendix I. Costs of Measurement of the Three Approaches

- Base power
- Execution time of the application
- Total Energy consumed by the application during the execution

- Base power with RAPL
- Total Energy with RAPL
- Base power with NVML/Intel SMC
- Total Energy with NVML/Intel SMC
- Execution Time

## Appendix J. Benchmark Suite for Comparison of Dynamic Energy Consumption using PMC-Based Energy Predictive Models and HCLWattsUp

Application | Description |
---|---|

MKL FFT | Fast Fourier Transform |

MKL DGEMM | Dense Matrix Multiplication |

HPCG | High performance conjugate gradient |

NPB IS | Integer Sort, Kernel for random memory access |

NPB LU | Lower-Upper Gauss-Seidel solver |

NPB EP | Embarrassingly Parallel, Kernel |

NPB BT | Block Tri-diagonal solver |

NPB MG | Multi-Grid on a sequence of meshes |

NPB FT | Discrete 3D fast Fourier Transform |

NPB DC | Data Cube |

NPB UA | Unstructured Adaptive mesh, dynamic memory access |

NPB CG | Conjugate Gradient |

NPB SP | Scalar Penta-diagonal solver |

NPB DT | Data traffic |

stress | CPU, disk and I/O stress |

Naive MM | Naive Matrix-matrix multiplication |

Naive MV | Naive Matrix-vector multiplication |

## References

- IEA. International Energy Agency (IEA) at COP21; IEA: Paris, France, 2015. [Google Scholar]
- Jones, N. How to stop data centres from gobbling up the world’s electricity. Nature
**2018**, 561, 163–166. [Google Scholar] [CrossRef] [PubMed] - ATAG. Air Transport Action Group (ATAG): Facts and Figures; ATAG: Dunfermline, UK, 2018. [Google Scholar]
- Andrae, A.; Edler, T. On Global Electricity Usage of Communication Technology: Trends to 2030. Challenges
**2015**, 6, 117–157. [Google Scholar] [CrossRef] - Konstantakos, V.; Chatzigeorgiou, A.; Nikolaidis, S.; Laopoulos, T. Energy Consumption Estimation in Embedded Systems. IEEE Trans. Instrum. Meas.
**2008**, 57, 797–804. [Google Scholar] [CrossRef][Green Version] - Rotem, E.; Naveh, A.; Ananthakrishnan, A.; Weissmann, E.; Rajwan, D. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro
**2012**, 32, 20–27. [Google Scholar] [CrossRef] - David, H.; Gorbatov, E.; Hanebutte, U.R.; Khanna, R.; Le, C. RAPL: Memory power estimation and capping. In Proceedings of the 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), Austin, TX, USA, 18–20 August 2010; pp. 189–194. [Google Scholar]
- Gough, C.; Steiner, I.; Saunders, W. ; Energy Efficient Servers: Blueprints for Data Center Optimization; Apress: New York, NY, USA, 2015; ISBN 978-1-4302-6638-9. [Google Scholar]
- Intel Corporation. Intel
^{®}Xeon Phi™ Coprocessor System Software Developers Guide; Intel Corporation: Santa Clara, CA, USA, 2014. [Google Scholar] - Intel Corporation. Intel
^{®}Manycore Platform Software Stack (Intel MPSS); Intel Corporation: Santa Clara, CA, USA, 2014. [Google Scholar] - Advanced Micro Devices. BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors; Advanced Micro Devices: Santa Clara, CA, USA, 2012. [Google Scholar]
- Hackenberg, D.; Ilsche, T.; Schöne, R.; Molka, D.; Schmidt, M.; Nagel, W.E. Power measurement techniques on standard compute nodes: A quantitative comparison. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, USA, 21–23 April 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 194–204. [Google Scholar]
- Nvidia. Nvidia Management Library: NVML Reference Manual; Nvidia: Santa Clara, CA, USA, 2018. [Google Scholar]
- Burtscher, M.; Zecena, I.; Zong, Z. Measuring GPU Power with the K20 Built-in Sensor. In Proceedings of the Workshop on General Purpose Processing Using GPUs, GPGPU-7, Salt Lake City, UT, USA, 1 March 2014; ACM: New York, NY, USA, 2014; pp. 28:28–28:36. [Google Scholar]
- Economou, D.; Rivoire, S.; Kozyrakis, C.; Ranganathan, P. Full-system power analysis and modeling for server environments. In International Symposium on Computer Architecture; IEEE: Piscataway, NJ, USA, 2006; pp. 70–77. [Google Scholar]
- McCullough, J.C.; Agarwal, Y.; Chandrashekar, J.; Kuppuswamy, S.; Snoeren, A.C.; Gupta, R.K. Evaluating the Effectiveness of Model-based Power Characterization. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’11), USENIX Association, Oregon, Portland, 15–17 June 2011; p. 12. [Google Scholar]
- O’brien, K.; Pietri, I.; Reddy, R.; Lastovetsky, A.; Sakellariou, R. A Survey of Power and Energy Predictive Models in HPC Systems and Applications. Acm Comput. Surv.
**2017**, 50, 37:1–37:38. [Google Scholar] [CrossRef] - Shahid, A.; Fahad, M.; Reddy, R.; Lastovetsky, A. Additivity: A Selection Criterion for Performance Events for Reliable Energy Predictive Modeling. Supercomput. Front. Innov. Int. J.
**2017**, 4, 50–65. [Google Scholar] - Heterogeneous Computing Laboratory. HCLWattsUp: Software API for Power and Energy Measurements Using WattsUp Pro Meter; School of Computer Science, University College Dublin: Dublin, Ireland, 2019. [Google Scholar]
- Hackenberg, D.; Schöne, R.; Ilsche, T.; Molka, D.; Schuchart, J.; Geyer, R. An Energy Efficiency Feature Survey of the Intel Haswell Processor. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, Hyderabad, India, 25—29 May 2015; pp. 896–904. [Google Scholar]
- Bellosa, F. The Benefits of Event: Driven Energy Accounting in Power-sensitive Systems. In Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System (EW 9), Kolding, Denmark, 17–20 September 2000; ACM: New York, NY, USA, 2000; pp. 37–42. [Google Scholar]
- Isci, C.; Martonosi, M. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, San Diego, CA, USA, 5 December 2003; IEEE: Washington, DC, USA, 2003; pp. 93–104. [Google Scholar]
- Li, T.; John, L.K. Run-time Modeling and Estimation of Operating System Power Consumption. Sigmetrics Perform. Eval. Rev.
**2003**, 31, 160–171. [Google Scholar] [CrossRef] - Lee, B.C.; Brooks, D.M. Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction. Sigarch Comput. Archit. News
**2006**, 34, 185–194. [Google Scholar] [CrossRef] - Heath, T.; Diniz, B.; Carrera, E.V.; Meira, W., Jr.; Bianchini, R. Energy Conservation in Heterogeneous Server Clusters. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’05), Chicago, IL, USA, 15–17 June 2005; ACM: New York, NY, USA, 2005; pp. 186–195. [Google Scholar]
- Fan, X.; Weber, W.D.; Barroso, L.A. Power Provisioning for a Warehouse-sized Computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA ’07), San Diego, CA, USA, 9–13 June 2017; ACM: New York, NY, USA, 2007; pp. 13–23. [Google Scholar]
- Singh, K.; Bhadauria, M.; McKee, S.A. Real Time Power Estimation and Thread Scheduling via Performance Counters. Sigarch Comput. Archit. News
**2009**, 37, 46–55. [Google Scholar] [CrossRef] - Goel, B.; McKee, S.A.; Gioiosa, R.; Singh, K.; Bhadauria, M.; Cesati, M. Portable, scalable, per-core power estimation for intelligent resource management. In Proceedings of the International Conference on Green Computing, Chicago, IL, USA, 15–18 August 2010; pp. 135–146. [Google Scholar]
- Basmadjian, R.; Ali, N.; Niedermeier, F.; de Meer, H.; Giuliani, G. A Methodology to Predict the Power Consumption of Servers in Data Centres. In Proceedings of the 2nd International Conference on Energy-Efficient Computing and Networking (e-Energy ’11), New York, NY, USA, 31 May–1 June 2011; ACM: New York, NY, USA, 2011; pp. 1–10. [Google Scholar]
- Bircher, W.L.; John, L.K. Complete System Power Estimation Using Processor Performance Events. IEEE Trans. Comput.
**2012**, 61, 563–577. [Google Scholar] [CrossRef] - Dargie, W. A Stochastic Model for Estimating the Power Consumption of a Processor. IEEE Trans. Comput.
**2015**, 64, 1311–1322. [Google Scholar] [CrossRef] - Lastovetsky, A.; Reddy, R. New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters. IEEE Trans. Parallel Distrib. Syst.
**2017**, 28, 1119–1133. [Google Scholar] [CrossRef] - Li, S.; Ahn, J.H.; Strong, R.D.; Brockman, J.B.; Tullsen, D.M.; Jouppi, N.P. The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing. ACM Trans. Archit. Code Optim.
**2013**, 10, 5. [Google Scholar] [CrossRef] - Haj-Yihia, J.; Yasin, A.; Asher, Y.B.; Mendelson, A. Fine-grain power breakdown of modern out-of-order cores and its implications on Skylake-based systems. ACM Trans. Archit. Code Optim. (TACO)
**2016**, 13, 56. [Google Scholar] [CrossRef] - Mair, J.; Huang, Z.; Eyers, D. Manila: Using a densely populated pmc-space for power modelling within large-scale systems. Parallel Comput.
**2019**, 82, 37–56. [Google Scholar] [CrossRef] - Hong, S.; Kim, H. An Integrated GPU Power and Performance Model. Sigarch Comput. Archit. News
**2010**, 38, 280–289. [Google Scholar] - Nagasaka, H.; Maruyama, N.; Nukada, A.; Endo, T.; Matsuoka, S. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Conference on Green Computing, Chicago, IL, USA, 15–18 August 2010; pp. 115–122. [Google Scholar]
- Song, S.; Su, C.; Rountree, B.; Cameron, K.W. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, Boston, MA, USA, 20–24 May 2013; pp. 673–686. [Google Scholar]
- Shao, Y.S.; Brooks, D. Energy characterization and instruction-level energy model of Intel’s Xeon Phi processor. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), Beijing, China, 4–6 September 2013; pp. 389–394. [Google Scholar]
- Al-Khatib, Z.; Abdi, S. Operand-Value-Based Modeling of Dynamic Energy Consumption of Soft Processors in FPGA. In Applied Reconfigurable Computing; Sano, K., Soudris, D., Hübner, M., Diniz, P.C., Eds.; Springer International Publishing: Berlin, Germany, 2015; pp. 65–76. [Google Scholar]
- Asanovic, K.; Bodik, R.; Catanzaro, B.C.; Gebis, J.J.; Husbands, P.; Keutzer, K.; Patterson, D.A.; Plishker, W.L.; Shalf, J.; Williams, S.W.; et al. The Landscape of Parallel Computing Research: A View from Berkeley; Technical Report UCB/EECS-2006-183; University of California: Berkeley, CA, USA, 2006. [Google Scholar]
- IntelPCM. Intel
^{®}Performance Counter Monitor—A Better Way to Measure CPU Utilization. 2017. Available online: https://software.intel.com/en-us/articles/intel-performance-counter-monitor (accessed on 10 June 2019). - PAPI. Performance Application Programming Interface 5.4.1. 2015. Available online: https://icl.utk.edu/papi/overview/index.html (accessed on 10 June 2019).
- Manumachu, R.R.; Lastovetsky, A. Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy. IEEE Trans. Comput.
**2018**, 67, 160–177. [Google Scholar] [CrossRef] - Reddy Manumachu, R.; Lastovetsky, A.L. Design of self-adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution. Concurr. Comput. Pract. Exp.
**2019**, 31, e4958. [Google Scholar] [CrossRef] - Khaleghzadeh, H.; Zhong, Z.; Reddy, R.; Lastovetsky, A. Out-of-core implementation for accelerator kernels on heterogeneous clouds. J. Supercomput.
**2018**, 74, 551–568. [Google Scholar] [CrossRef] - Treibig, J.; Hager, G.; Wellein, G. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments. In Proceedings of the 2010 39th International Conference on Parallel Processing Workshops, San Diego, CA, USA, 13–16 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 207–216. [Google Scholar][Green Version]
- Perf Wiki. perf: Linux Profiling with Performance Counters; Wikipedia, the Free Encyclopedia, 2017. Available online: https://perf.wiki.kernel.org/index.php/Main_Page (accessed on 10 June 2019).
- Alonso, P.; Badia, R.M.; Labarta, J.; Barreda, M.; Dolz, M.F.; Mayo, R.; Quintana-Ortí, E.S.; Reyes, R. Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications. In Proceedings of the 2012 41st International Conference on Parallel Processing, Pittsburgh, PA, USA, 10–13 September 2012; pp. 420–429. [Google Scholar]
- Mantovani, F.; Calore, E. Performance and power analysis of HPC workloads on heterogeneous multi-node clusters. J. Low Power Electron. Appl.
**2018**, 8, 13. [Google Scholar] [CrossRef] - Zhou, Z.; Abawajy, J.H.; Li, F.; Hu, Z.; Chowdhury, M.U.; Alelaiwi, A.; Li, K. Fine-Grained Energy Consumption Model of Servers Based on Task Characteristics in Cloud Data Center. IEEE Access
**2018**, 6, 27080–27090. [Google Scholar] [CrossRef] - Bedard, D.; Lim, M.Y.; Fowler, R.; Porterfield, A. PowerMon: Fine-grained and integrated power monitoring for commodity computer systems. In Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon), Concord, NC, USA, 18–21 March 2010; pp. 479–484. [Google Scholar]
- Ge, R.; Feng, X.; Song, S.; Chang, H.; Li, D.; Cameron, K.W. PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications. IEEE Trans. Parallel Distrib. Syst.
**2010**, 21, 658–671. [Google Scholar] [CrossRef] - Laros, J.H.; Pokorny, P.; DeBonis, D. PowerInsight—A commodity power measurement capability. In Proceedings of the 2013 International Green Computing Conference Proceedings, Arlington, VA, USA, 27–29 June 2013; pp. 1–6. [Google Scholar]
- Intel Corporation. Intelligent Platform Management Interface Spec; Intel Corporation: Santa Clara, CA, USA, 2013. [Google Scholar]
- Intel Corporation. DCMI—Data Center Manageability Interface Specification; Intel Corporation: Santa Clara, CA, USA, 2011. [Google Scholar]

**Figure 1.**Dynamic energy consumption profile segments of HCLWattsUp and Intel RAPL for 2D FFT computation using FFTW-3.3.7 on HCLServer03.

**Figure 2.**Dynamic energy profiles with Running Average Power Limit (RAPL) and HCLWattsUp on HCLServer03, class A. RAPL calib. means that RAPL readings have been calibrated.

**Figure 3.**Dynamic energy profiles by RAPL and HCLWattsUp on HCLServer03, class B. RAPL calib. means that RAPL readings have been calibrated. (

**a**) DGEMM, N = 10,240–25,600, (

**b**) MKL-FFT, N = 32,768–43,456.

**Figure 4.**Dynamic energy profiles of FFTW (N = $20,480-26,560$) by RAPL and HCLWattsUp on HCLServer03, class C.

**Figure 5.**Dynamic energy consumption profiles of DGEMM and CUDA FFT on Nvidia K40c GPU on HCLServer01. RAPL+GPUSensors calib. means that RAPL+GPUSensors values have been calibrated.

**Figure 6.**Dynamic energy consumption profiles of Intel MKL DGEMM and Intel MKL FFT on Xeon Phi co-processor. RAPL+PHISensors calib. means that RAPL+PHISensors values have been calibrated.

**Figure 7.**Percentage deviations of predictive models and RAPL from HCLWattsUp. The dotted lines represent the averages.

**Table 1.**HCLServer1: Specifications of the Intel Haswell multicore CPU, Nvidia K40c and Intel Xeon Phi 3120P.

Intel Haswell E5-2670V3 | |
---|---|

Launch Date | Q3’14 |

No. of cores per socket | 12 |

Socket(s) | 2 |

CPU MHz | 1200.402 |

L1d cache, L1i cache | 32 KB, 32 KB |

L2 cache, L3 cache | 256 KB, 30,720 KB |

Total main memory | 64 GB DDR4 |

Memory bandwidth | 68 GB/s |

Nvidia K40c | |

Launch Date | Q4’13 |

No. of processor cores | 2880 |

Total board memory | 12 GB GDDR5 |

L2 cache size | 1536 KB |

Memory bandwidth | 288 GB/s |

Intel Xeon Phi 3120P | |

Launch Date | Q2’13 |

No. of processor cores | 57 |

Total main memory | 6 GB GDDR5 |

Memory bandwidth | 240 GB/s |

Intel Xeon Gold 6152 | |
---|---|

Launch Date | Q3’17 |

Socket(s) | 1 |

Cores per socket | 22 |

L1d cache, L1i cache | 32 KB, 32 KB |

L2 cache, L3 cache | 256 KB, 30,976 KB |

Main memory | 96 GB |

Nvidia P100 PCIe | |

Launch Date | Q2’16 |

No. of processor cores | 3584 |

Total board memory | 12 GB CoWoS HBM2 |

Memory bandwidth | 549 GB/s |

**Table 3.**HCLServer3: Specifications of the Intel Skylake multicore processor (CPU) consisting of two sockets of 28 cores each.

Technical Specifications | Intel Xeon Platinum 8180 |
---|---|

Launch Date | Q3’17 |

Socket(s) | 2 |

Cores per socket | 28 |

L1d cache, L1i cache | 32 KB, 32 KB |

L2 cache, L3 cache | 1024 KB, 39,424 KB |

Main memory | 187 GB |

**Table 4.**Percentage error of dynamic energy consumption by Nvidia K40c GPU with and without calibration and HCLWattsUp on HCLServer01.

Without Calibration | |||
---|---|---|---|

application | Min | Max | Avg |

DGEMM | 0.076% | 35.32% | 10.62% |

FFT | 0.52% | 57.77% | 12.45% |

With Calibration | |||

application | Min | Max | Avg |

DGEMM | 0.19% | 30.50% | 10.43% |

FFT | 0.18% | 94.55% | 10.87% |

**Table 5.**Percentage error of dynamic energy consumption with and without calibration and HCLWattsUp on Intel Xeon Phi.

Without Calibration | |||
---|---|---|---|

Application | Min | Max | Avg |

DGEMM | 45.1% | 93.06% | 64.5% |

FFT | 22.58% | 55.78% | 40.68% |

With Calibration | |||

Application | Min | Max | Avg |

DGEMM | 0.06% | 9.54% | 2.75% |

FFT | 0.06% | 32.3% | 9.58% |

**Table 6.**Correlation of performance monitoring computers (PMCs) with dynamic energy consumption (${E}_{D}$). Correlation matrix showing relationship of dynamic energy with PMCs. 100% correlation is denoted by 1.

${\mathit{E}}_{\mathit{D}}$ | ${\mathit{X}}_{1}$ | ${\mathit{X}}_{2}$ | ${\mathit{X}}_{3}$ | ${\mathit{X}}_{4}$ | ${\mathit{X}}_{5}$ | ${\mathit{X}}_{6}$ | |
---|---|---|---|---|---|---|---|

${E}_{D}$ | 1 | 0.53 | 0.50 | 0.42 | 0.58 | 0.99 | 0.99 |

${X}_{1}$ | 0.53 | 1 | 0.41 | 0.25 | 0.39 | 0.45 | 0.44 |

${X}_{2}$ | 0.50 | 0.41 | 1 | 0.19 | 0.99 | 0.48 | 0.48 |

${X}_{3}$ | 0.42 | 0.25 | 0.19 | 1 | 0.21 | 0.41 | 0.40 |

${X}_{4}$ | 0.58 | 0.39 | 0.99 | 0.21 | 1 | 0.57 | 0.56 |

${X}_{5}$ | 0.99 | 0.45 | 0.48 | 0.41 | 0.57 | 1 | 0.99 |

${X}_{6}$ | 0.99 | 0.44 | 0.48 | 0.40 | 0.56 | 0.99 | 1 |

**Table 7.**Linear predictive models (A-F) with intercepts and RAPL with their minimum, average and maximum prediction errors.

Model | PMCs | Intercept Followed by Coefficients | Percentage Prediction Errors (min, avg, max) |
---|---|---|---|

A | ${X}_{1},{X}_{2},{X}_{3},{X}_{4},{X}_{5},{X}_{6}$ | 10, $3\times {10}^{-9}$, $1.9\times {10}^{-8}$, $3.3\times {10}^{-7}$, $-1\times {10}^{-6}$, $6\times {10}^{-8}$, $-9.3\times {10}^{-11}$ | (2.7, 32, 99.9) |

B | ${X}_{1},{X}_{2},{X}_{4},{X}_{5},{X}_{6}$ | $3\times {10}^{-9}$, $1.9\times {10}^{-8}$, $-1\times {10}^{-6}$, $6.2\times {10}^{-8}$, $-1.2\times {10}^{-10}$, 230 | (0.53, 21.80, 72.9) |

C | ${X}_{1},{X}_{4},{X}_{5},{X}_{6}$ | $3.7\times {10}^{-9}$, $7.9\times {10}^{-9}$, $7.5\times {10}^{-8}$, $-5.1\times {10}^{-10}$, 270 | (0.75, 29.81, 77.2) |

D | ${X}_{4},{X}_{5},{X}_{6}$ | $6.7\times {10}^{-8}$, $9.4\times {10}^{-8}$, $-9.7\times {10}^{-10}$, 490 | (0.21, 23.19, 80.42) |

E | ${X}_{5},{X}_{6}$ | $9.7\times {10}^{-8}$, $-1.02\times {10}^{-9}$, 520 | (2, 21.03, 83.40) |

F | ${X}_{6}$ | $1.5\times {10}^{-9}$, 740 | (2.5, 14.39, 34.64) |

RAPL | (4.1, 30.6, 58.9) |

**Table 8.**Selected PMCs for Class B experiments along with their energy correlation for DGEMM and FFT. 0 to 1 represents positive correlation of 0% to 100%.

Selected PMCs | Corr DGEMM | Corr FFT | |
---|---|---|---|

Y1 | FP_ARITH_INST_RETIRED_DOUBLE | 0.99 | 0.98 |

Y2 | MEM_INST_RETIRED_ALL_STORES | 0.99 | 0.99 |

Y3 | MEM_INST_RETIRED_ALL_LOADS | 0.98 | 0.55 |

Y4 | MEM_LOAD_RETIRED_L3_MISS | 0.60 | 0.99 |

Y5 | MEM_LOAD_RETIRED_L1_HIT | 0.98 | 0.34 |

Y6 | ICACHE_64B_IFTAG_MISS | 0.99 | 0.77 |

Problem Size (N) | Min | Max | Avg |
---|---|---|---|

14,336 | 17% | 172% | 65% |

14,848 | 12% | 153% | 58% |

15,360 | 13% | 240% | 56% |

16,384 | 2% | 300% | 56% |

Problem Size (N) | Energy Loss without Calibration | Energy Loss after Calibration |
---|---|---|

14,336 | 54 | 16 |

14,848 | 37 | 8 |

15,360 | 31 | 12 |

16,384 | 84 | 40 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fahad, M.; Shahid, A.; Manumachu, R.R.; Lastovetsky, A.
A Comparative Study of Methods for Measurement of Energy of Computing. *Energies* **2019**, *12*, 2204.
https://doi.org/10.3390/en12112204

**AMA Style**

Fahad M, Shahid A, Manumachu RR, Lastovetsky A.
A Comparative Study of Methods for Measurement of Energy of Computing. *Energies*. 2019; 12(11):2204.
https://doi.org/10.3390/en12112204

**Chicago/Turabian Style**

Fahad, Muhammad, Arsalan Shahid, Ravi Reddy Manumachu, and Alexey Lastovetsky.
2019. "A Comparative Study of Methods for Measurement of Energy of Computing" *Energies* 12, no. 11: 2204.
https://doi.org/10.3390/en12112204