# On the Use of Probabilistic Worst-Case Execution Time Estimation for Parallel Applications in High Performance Systems

## Abstract

## 1. Introduction

#### 1.1. Motivation

#### 1.2. Contribution

- Contribution 1: Exploration of execution conditions. We integrate a software randomization layer in the geophysical exploration application to test its susceptibility to memory layouts caused by different code, heap and stack allocations. This contribution is provided in Section 2.
- Contribution 2: Worst-Case Execution Time (WCET) analysis. We analyze and fit an MBPTA technique for WCET prediction so that it can be used in the context of HPC applications running on high-performance systems. This contribution is provided in Section 3.
- Contribution 3: Evaluation and scalability. We evaluate those techniques on the geophysical exploration application, proving their viability to study its (high) execution time behavior, and showing that appropriate integration of those techniques allows scaling the application to the use of parallel paradigms, thus beyond the execution conditions considered in embedded systems. This contribution is provided in Section 4.

## 2. Execution Time Test Coverage Improvement for HPC

#### 2.1. Memory-Placement Software Randomization

#### 2.2. Code Randomization

#### 2.3. Stack Randomization

#### 2.4. Heap Randomization

#### 2.5. Summary

## 3. Measurement-Based Probabilistic Timing Analysis for HPC

#### 3.1. MBPTA-CV Fundamentals

#### 3.2. MBPTA-CV Steps

#### Input Sample Generation

#### 3.3. Independence and Identical Distribution

#### 3.3.1. Exponential Tail Test

#### 3.3.2. Select the Best Tail

#### 3.3.3. pWCET Estimate

#### 3.4. Summary

## 4. Evaluation

#### 4.1. Case study

- Pre-processing: data is read from disk and several checks are performed to ensure that the subsequent processing is valid.
- Main processing: composed essentially by a frequency loop, the goal of this phase is to iteratively obtain the real value of a given variable (or a set of variables) from an initial guess. To do this, the signal of the input sources and the received data is bandpass-filtered to the frequency of interest by means of an adjoint-state method. The input model is modified to that with the smallest error. Once this has been achieved, another frequency is considered.
- Post-processing: the objective of this phase is to ensure that the numerical values of the generated output are quantitatively correct. Also, it holds all the specific routines to adapt the data to the output expected by the user (interpolation, file formats, etc.).

#### 4.2. Experimental Framework

- Be representative of real problems.
- Deliver execution times no less than hundreds of milliseconds, as explained before, for a reliable application of MBPTA-CV despite OS noise.

#### 4.3. Results

## 5. Related Work

## 6. Conclusions

**Figure 4.**Measurement-based probabilistic timing analysis (MBPTA)-CV process to obtain pWCET estimates.

**Figure 6.**pWCET distributions for the Full Waveform Inversion (FWI) application for a problem size of 32 × 32 × 32. Plots correspond from top to bottom to no randomization, code randomization, stack randomization, and heap randomization respectively; and from left to right to 1, 2, 4, 8, 12 and 24 threads respectively.

**Figure 7.**CV-plot for the FWI application for a problem size of 32 × 32 × 32. Plots correspond from left to right to no randomization, code randomization, stack randomization, and heap randomization respectively; and from top to bottom to 1, 2, 4, 8, 12 and 24 threads respectively.

**Figure 8.**pWCET distributions for the FWI application with 24 threads. Plots correspond from top to bottom to 64 × 64 × 64 problem size, 128 × 128 × 128, and 192 × 192 × 192; and from left to right to no randomization, code randomization, stack randomization, and heap randomization respectively.

**Figure 9.**CV-plot for the FWI application with 24 threads. Plots correspond from top to bottom to 64 × 64 × 64 problem size, 128 × 128 × 128, and 192 × 192 × 192; and from left to right to no randomization, code randomization, stack randomization, and heap randomization respectively.

**Figure 10.**Absolute maximum execution time (MET) and pWCET estimate (in seconds) at an exceedance probability of ${10}^{-6}$ per run for the different configurations.

**Figure 11.**Relative increase of the pWCET estimate at an exceedance probability of ${10}^{-6}$ w.r.t. maximum execution times (MET) for the different configurations.

