Reducing WCET Overestimations in Multi-Thread Loops with Critical Section Usage

: Worst-case execution time (WCET) is an important metric in real-time systems that helps in energy usage modeling and predeﬁned execution time requirement evaluation. While basic timing analysis relies on execution path identiﬁcation and its length evaluation, multi-thread code with critical section usage brings additional complications and requires analysis of resource-waiting time estimation. In this paper, we solve a problem of worst-case execution time overestimation reduction in situations when multiple threads are executing loops with the same critical section usage in each iteration. The experiment showed the worst-case execution time does not take into account the proportion between computational and critical sections; therefore, we proposed a new worst-case execution time calculation model to reduce the overestimation. The proposed model results prove to reduce the overestimation on average by half in comparison to the theoretical model. Therefore, this leads to more accurate execution time and energy consumption estimation.


Introduction
One of the metrics defining programming code efficiency is the execution time, that is, task execution speed. The execution time is highly important in real-time systems where strict bounds for task execution time might be applied. Safety-critical real-time embedded systems need to be functionally correct, meet timing deadlines requirement of low energy consumption, and conform to the increasing demand for computing power. It is a necessary step in the development and validation process for real-time embedded systems.
One of the reasons to predict and assure programming code execution time bounds is energy consumption. In battery-powered systems, the code execution time should be adjusted to battery replacement or charging times. Since it is also based on the code execution time, the energy consumption amount can be predicted. However, real experiments for code execution time bound estimation in most cases is not appropriate because of expensive resource (and energy) usage. Therefore, it is important to estimate the possible ranges of the code execution time in advance, without executing it. While the best-case (minimal) execution time indicates the ideal situation, the more important is the worst-case execution time (WCET). The WCET is important for the evaluation of software safety and the optimization of energy consumption.
WCET estimation is performed by analyzing possible paths and variations in the code. Traditional WCET takes into account the code flow (operations, conditions, loops, etc.) and resource (memory, critical sections, etc.) usage. WCET in multi-thread, multi-core systems are even more challenging because it requires estimation of interaction between threads. Multi-thread code leads to WCET unpredictability when two or more threads request access to the same critical section at an identical moment in time. This increases the worst-case execution time because at least one thread must wait while the resource will be released. The increased WCET can cause missing the hard deadlines, consequently resulting in system failure [1], and it is very challenging to tightly predict the worst-case resource usage at compile time [2].
In WCET analysis, the goal is to come as close as possible to the real execution time bounds, but not below, the execution time on the target hardware. Due to different reasons, the overestimation of the WCET is unavoidable; however, it should be as close as possible to the real execution time in order to economize on resource and energy usage.
The aim of this paper is to reduce the WCET overestimation in multi-thread loops with critical section usage.
The structure of the paper is as follows. Related research in the area of WCET estimation is overviewed in Section 2. In Section 3, existing WCET estimation model for multi-thread code is adapted for loop usage in order to highlight its overestimation problem. A new WCET estimation model for multi-thread code, with executed loops and critical section usage in it, is also proposed in Section 3. The proposed model is applied and validated in Section 4 to indicate the reduction of the overestimation problem.
The proposed WCET estimation model is new and takes into account the distribution of computational and critical section blocks both of an analyzed and other threads. The proposed model proves to reduce the overestimation on average by half in comparison to the existing theoretical model.

Related Research
In design or real-time and embedded systems, the execution time is an important parameter. Execution time allows the prediction of system performance bounds and at the same time impacts the energy consumption. Minimization of energy consumption is an important problem and a key factor in sustainable production and operation. Therefore, the energy consumption analysis and prediction problems are solved in these software and hardware decisions such as wireless sensor networks, cloud/distributed system computing, battery-powered real-time systems, and high-performance computing [3][4][5][6].
The landscape of energy consumption prediction solutions varies. One area is worstcase response energy consumption (WCRE) based on hardware usage. Wagemann et al. [7] proposed a SysWCEC solution, which takes into account different hardware activation and swathing off. Another area is WCET prediction. By knowing the WCET value or modeling it for different situations, the tradeoff between the program performance and energy consumption can be estimated [8]. Therefore, accurate estimation of WCET leads to more accurate energy consumption estimation.
Commonly, there are two main methods of finding WCET-measurement-based methods and static methods. In measurement-based methods, the execution time is measured either through direct measurement or simulation of the code by giving different inputs. The level of granularity at which these measurements can be performed varies for different processor architectures. While most current processor architectures support hardware performance counters, these counters can only provide a limited level of accuracy and care must be taken to obtain accurate measurements [9]. Zoubek and Trommler [10] highlight that the measurement-based method is not safe for critical systems. There are concerns that not all factors are taken into account-the state changes in the underlying hardware cache line being evicted, a pipeline being totally flushed out, etc. [11].
Another WCET estimation method type is static timing analysis. These techniques estimate the execution time of a program without actually executing any code. They are mainly used to determine the WCET of a program, meaning a conservative estimate or upper bound for the execution time of a program. Static methods are safe, from generalized perspectives, but safety comes at the cost of a possible over-approximation. As can be seen, Energies 2021, 14, 1747 3 of 12 static estimation tries to find an upper bound to the WCET by approaching from higher times, for instance, classifying cache hits that are guaranteed to be applied in every case of execution [12][13][14].
The more active research activity in the field of parallel programming code WCET analysis started from 2011. Then, multiple papers on static timing analysis to derive safe bounds of parallel software were published [15][16][17][18][19][20][21]. Since nowadays hardware solutions are mostly multi-cores, some additional challenges arise in timing analysis. Accurate WCET analysis must take into account multi-thread solutions, shared cache/shared bus usage, concurrently reading and/or writing shared data, etc. All these issues are raised in the field of WCET estimation for independent threads running concurrently [22,23].
A systematic literature analysis was executed in Web of Science: Science Citation Index Expanded (Clarivate), Scopus (Elsevier), Association for Computing Machinery (ACM) Digital Library, and Google Scholar systems to find papers in the field of WCET analysis in modern, parallel systems, loop situations, or critical sections. Most databases were referencing the same scientific papers; therefore, we summarize the main results of the research. The results of the analysis (see Table 1) present the most related WCET research papers, devoted to improving the accuracy of WCET estimates. The summary of research papers in the field of WCET in parallel systems revealed a variety of different research areas. All the papers investigate some specific architectures and situations of solution. This indicates the timing analysis problem modernity because every new architecture or programming code implementation solution requires appropriate models for its timing analysis.

WCET Estimation in Parallel Threads, Executing Loop Actions with Critical Section Usage
The calculation of worst-case execution time in parallel applications is difficult because of critical section usage, which can be used only by one thread at one time moment. If the critical section is locked (used by another thread), another thread(s) has to wait until the resource will be released. In situations such as this one, the code execution time cannot be estimated precisely, because it is difficult to predict when different threads will request for a critical section and when they will receive it. The order of critical section execution by different threads has also a big influence on the thread scheduling. Therefore, WCET calculation for each thread must take into account the worst possible order of the critical section usage by different threads.
In Section 3.1, we formalize the existing WCET models by applying them to parallel thread execution in a loop with critical sections. In Section 3.2, we compare the existing models to brute-force situations, representing real WCET values. Based on the insight, a new model for WCET calculation is presented in Section 3.3.

Theoretical Background for WCET Calculation in Parallel Threads, Executing Loop Actions with Critical Sections
The WCET for each thread can be calculated as a sum of worst-case calculation time (WCCT) and worst-case waiting time (WCWT). This is possible when computational (comp) and critical section (cs) blocks of all threads have performance limits (maximum and minimum possible execution values).
The WCCT for thread X can be calculated easily because no additional influencing factors are influencing it. It is equal to the sum of all computational and critical section blocks maximum sizes. Adapting the idea for a multi-thread loop, it should be calculated by adding maximum execution time values of all n computational blocks and all m critical section blocks in each loop iterations. When maximum block values are the same in each loop iteration the sum of maximum execution time for computational (comp_max) and critical section (cs_max) blocks in one iteration should be multiplied by the number of iterations N in the loop (see Equation (1)).
Equation (1) can be used to obtain WCET in the best-case scenario in which no conflicts exist between different threads. However, it is unrealistic to expect WCCT to be the same as WCET in cases of multiple threads in the loop using the same critical section. The probability of using the same critical section at the same time exists and some additional delay will be added to WCCT.
The worst-case waiting time analysis model, proposed by Ozaktas et al. [23], assumes critical section block in one thread will arrive at the same time as in other threads. In this situation, the thread will be placed into a queue and will be granted access to the critical section as the last thread, after all other threads release it. Taking into account the maximum execution times of these critical sections, the worst-case waiting time of one thread WCWT X can be calculated by adding the maximum value of the critical section in all other threads than this (there are P threads in total) for each iteration. Therefore, if we want to calculate the WCWT for one thread X out of P treads, we need to sum the maximum critical section block size of each m critical sections of each iteration of N iteration loop. The critical section blocks should also be of all other threads rather than the thread itself because the thread does not need to wait for critical section, occupied by itself. Therefore, we sum up critical section block sizes of threads from 1 to P, except the thread X itself (see Equation (2)). In the case in which all iterations of the loop have the same maximum execution time values of critical sections, the sum of all critical section execution times in other threads than this one has to be added and multiplied by the number of iterations in the loop.
To calculate the total WCET of the loop with N iterations and P threads, there have to be taken the WCET value of the thread, which has the maximum WCCT X + WCWT X value (see Equation (3)). It is important to assign the WCET to the maximum sum of WCCT X + WCWT X because it defines when the slowest thread will finish its execution. The program execution will finish only when the last thread finishes its execution.
From now on in this paper, we assume each thread in one loop iteration has just one computational and one critical section. In most cases, the minimum and maximum possible values of computational and critical section blocks are the same for each iteration. This simplifies the problem and reduces some additional variations. Taking into account those assumptions and the presented theoretical background of the situation, the WCET calculation for P parallel threads, executing the same actions in the loop with N iterations can be simplified. The simplified form is different in the expression of WCET value in each iteration of the loop-WCET of one iteration is equal to the sum of the maximum size of its computational block and the worst case for waiting time when the thread must wait for other threads to release the critical section. The maximum critical section execution time for this thread should also be added to obtain the WCCT. Therefore, the WCET of one iteration for thread k is expressed as the sum of the maximum critical section size of each P thread in the system plus the maximum computational size of the thread (Equation (4)).
This formula would be equal to the WCET estimation by Ozaktas et al. [23], adapted for the loop situation. We named this WCET value as theoretical.

Experiments for WCET Overestimation Estimation
To estimate how the theoretical WCET estimation method is precise, series of experiments were executed. During the experiments, all possible combinations of computational and critical section distribution in time were modeled and called brute-force WCET. In this experiment, a case of two threads was analyzed. Each of the threads executed five loop iterations. The time of all critical sections varied from one to two cycles, while computational block time in the first thread varied from 96 to 97 cycles.
The computational block time of the second thread was modeled to have different values to present different proportions between those two threads CpP (computational time of thread two divided by computational time of thread one). In order to evaluate the impact of the ratio between computational and critical sections, the computational block size of the second thread was changed to obtain 20 different situations. For the second thread, the computational block size started from values of 1-2 cycles and was increased by five cycles for each situation up to 96-97 cycles' value.
The size of computational and critical sections was selected as cycles in order to use discrete modeling. Due to the huge number of analyzed situations (for this situation, 1,048,576 code block distribution combinations were generated), the discrete event modeling is not suitable for real-time calculations. It also takes time to brute force all possible combinations based on the implementation because it might require additional memory to store all intermediate states. For comparison of theoretical WCET to modeled, brute-force WCET values were compared to calculated WCCT and WCET values based on the presented equations in Section 3.1.
The modeling results revealed that if the critical section maximum size is smaller than the computational section size, the WCWT in most cases is at least two times smaller than the model by Ozaktas et al. [23], predicted and described in the theoretical model (see Figure 1).
Energies 2021, 14, x FOR PEER REVIEW 6 of 12 1,048,576 code block distribution combinations were generated), the discrete event modeling is not suitable for real-time calculations. It also takes time to brute force all possible combinations based on the implementation because it might require additional memory to store all intermediate states.
For comparison of theoretical WCET to modeled, brute-force WCET values were compared to calculated WCCT and WCET values based on the presented equations in Section 3.1.
The modeling results revealed that if the critical section maximum size is smaller than the computational section size, the WCWT in most cases is at least two times smaller than the model by Ozaktas et al. [23], predicted and described in the theoretical model (see Figure 1).

Figure 1.
Experimental results with two threads, illustrating the theoretical WCET value to bruteforce WCET to highlight the overestimation problem.
In the given example, the WCCT is equal to 515 cycles. It does not depend on the proportion between the computational time of two threads CpP because for WCCT, we take the longer thread only. This value defines the fastest possible execution time.
As the proposed model by Ozaktas et al. [23] for WCET calculation to WCCT adds maximum critical section execution times of all other threads, the value is constant and independent from the proportion between the computational time of two threads CpPit is equal to 545 in this example situation.
The area between WCCT and theoretical WCWT in the range where actual execution times might appear. Discrete event modeling results prove its values fit into the defined value range; however, it shows the WCWT calculation might be reduced to be closer to the real (brute-force) WCET-the modeled maximum WCET is equal to 524, which is just one-third of the dedicated range. This is influenced by the fact there are no collisions in every iteration; therefore, the thread maximum waiting time does not have to be added in each iteration. This proves a model for more accurate WCET calculation is needed in order to reduce the overestimation.

Proposed Model for WCET Overestimation Reduction
To highlight how WCWT can be reduced, an example of a situation with three threads of the same priority are executed in one loop with N iterations and have only one computational and one critical section in one iteration.
As shown in Figure 2, if all threads would such as to enter a critical section at the same time (the beginning of ith iteration) one of the threads (third thread) would be forced to wait while all others would leave the critical section. This is the worst-case waiting time scenario for the third thread in this loop iteration, which can be obtained by Equation (2). If the maximum value of all computational blocks would be close to zero, all threads In the given example, the WCCT is equal to 515 cycles. It does not depend on the proportion between the computational time of two threads CpP because for WCCT, we take the longer thread only. This value defines the fastest possible execution time.
As the proposed model by Ozaktas et al. [23] for WCET calculation to WCCT adds maximum critical section execution times of all other threads, the value is constant and independent from the proportion between the computational time of two threads CpP-it is equal to 545 in this example situation.
The area between WCCT and theoretical WCWT in the range where actual execution times might appear. Discrete event modeling results prove its values fit into the defined value range; however, it shows the WCWT calculation might be reduced to be closer to the real (brute-force) WCET-the modeled maximum WCET is equal to 524, which is just one-third of the dedicated range. This is influenced by the fact there are no collisions in every iteration; therefore, the thread maximum waiting time does not have to be added in each iteration. This proves a model for more accurate WCET calculation is needed in order to reduce the overestimation.

Proposed Model for WCET Overestimation Reduction
To highlight how WCWT can be reduced, an example of a situation with three threads of the same priority are executed in one loop with N iterations and have only one computational and one critical section in one iteration.
As shown in Figure 2, if all threads would such as to enter a critical section at the same time (the beginning of ith iteration) one of the threads (third thread) would be forced to wait while all others would leave the critical section. This is the worst-case waiting time scenario for the third thread in this loop iteration, which can be obtained by Equation (2). If the maximum value of all computational blocks would be close to zero, all threads would be waiting for a locked resource every time they requested it. Therefore, the real WCET for all threads would be very close to WCET, calculated by Equation (2). tions-(a) all computational blocks and critical section blocks should be exactly the same length in all threads and (b) the computational block in each thread should be at least as big as the sum of critical section blocks in all the rest threads. When all threads start at the same time, in the same iteration, there would be a delay because of the occupied critical section. In the next iteration, once the first thread releases the critical section, the second thread will take it, and after the second thread release the critical section, the third thread will request for it, etc. This example and an experiment we described earlier (see Figure 1) shows that the WCWT is dependable on the proportion of computational and critical section size and on computational and critical section size proportion of one thread compared to other threads. Therefore, for WCET calculation overestimation reduction, it is important to find out how many iterations can be executed with no collision in a row after a collision has happened.
Modeling and theoretical assumptions revealed that if the length of the computational section of one thread is shorter than a critical section of another thread, there will be a collision between these threads in every iteration because there is no chance to execute the computational block in one thread with no delay while the second thread is in a critical section. Therefore, if for one thread, the minimal possible computational block length compmin is shorter than the maximum possible critical section length csmax in another thread, the frequency of collisions of threads X with thread Y (cfX->Y) is equal to 1.
If the minimal possible computational block length compmin in one thread is longer than the length of maximum possible critical section length csmax in another thread, at least one full iteration (compmax + csmax) of the shorter thread will be executed during the computational time of the longer thread. This means there would be no collision in this iteration for the shorter thread. The simplest way to find out what is the maximum frequency of However, by increasing the size of the computational block, the waiting time can be decreased, while instead of waiting for the critical section, the thread will execute different computations and potentially will request for the critical section block after it will be released by other threads. The ideal situation could be achieved by meeting two conditions-(a) all computational blocks and critical section blocks should be exactly the same length in all threads and (b) the computational block in each thread should be at least as big as the sum of critical section blocks in all the rest threads. When all threads start at the same time, in the same iteration, there would be a delay because of the occupied critical section. In the next iteration, once the first thread releases the critical section, the second thread will take it, and after the second thread release the critical section, the third thread will request for it, etc.
This example and an experiment we described earlier (see Figure 1) shows that the WCWT is dependable on the proportion of computational and critical section size and on computational and critical section size proportion of one thread compared to other threads. Therefore, for WCET calculation overestimation reduction, it is important to find out how many iterations can be executed with no collision in a row after a collision has happened.
Modeling and theoretical assumptions revealed that if the length of the computational section of one thread is shorter than a critical section of another thread, there will be a collision between these threads in every iteration because there is no chance to execute the computational block in one thread with no delay while the second thread is in a critical section. Therefore, if for one thread, the minimal possible computational block length comp min is shorter than the maximum possible critical section length cs max in another thread, the frequency of collisions of threads X with thread Y (cf X->Y ) is equal to 1.
If the minimal possible computational block length comp min in one thread is longer than the length of maximum possible critical section length cs max in another thread, at least one full iteration (comp max + cs max ) of the shorter thread will be executed during the computational time of the longer thread. This means there would be no collision in this iteration for the shorter thread. The simplest way to find out what is the maximum frequency of collisions between two threads is to calculate how many full iterations of the shorter thread can be fitted into the minimum possible computational time of the longer thread. To do this, we need to take the minimal computational size of the longer thread and divide it from the sum of maximum computational and critical section times. As we calculate how many full iterations might fit into one computational size of another thread, the obtained value should be cast to obtain the integer part of it only. (see Equation (5)).
where comp Y_min means the minimum possible computational time of the longer thread, comp Y_min is the maximum possible computational time of the shorter thread, and cs Y_min is the maximum possible critical section time of the shorter thread. If both cf X->Y and cf Y->X are equal to 1, it means that both threads will have a collision in all iterations. However, if one of these frequencies is smaller than 1 (only one can be smaller than 1), it means multiple iterations of one thread will fit into one iteration of another thread, and there will be no collisions in each iteration of the thread. Naturally, the thread with a shorter loop size will finish earlier compared to the thread with a longer iteration time. Therefore, the longer thread for a certain number of iterations will have no collisions because it will be the only thread of these two that did not finish all iterations of the loop. Therefore, for the first N· ∧ (c f X→Y ; c f Y→X ) iterations, the delay on one iteration WCWT i will be equal to the maximum value of other thread, while for the rest N − N· ∧ (c f X→Y ; c f Y→X ) iterations, the waiting time of one iteration WCWT i will be equal to 0.
The minimal collision frequency of two threads c (see Equation (6)) is important for both the shorter and longer thread WCWT calculation because for the shorter thread, it will define how often the iteration will have a collision in it, while for the longer thread, it can be used to find out how many iterations will be left after the shorter thread is finished.
If there are two threads with known minimum and maximum possible computational times and maximum possible critical section times, the WCET for thread X (WCET X ) can be calculated according to Equation (7). In this equation, we calculate how many iterations will have a coalition between those two threads (the WCET of this iteration will be equal to the sum of the maximum computational block size of the thread and the sum of the maximum critical section execution time of both threads) and how many iterations will be collision free (therefore, a maximum critical section block size of the other thread is not needed). cWCET X = N·(comp X_max + cs X_max + cs Y_max )·c X,Y + (N − N·c X,Y )·(comp X_max + cs X_max ) = N·(cs Y_max ·c X,Y + comp X_max + cs X_max ) If there are more than two threads, it is not enough to calculate the collision frequency with all other threads and to choose the biggest one. Collisions in one pair of threads influence other pairs as well. Therefore, to ensure the real WCWT will not be longer than the calculated one, all possible combinations between threads have to be taken into account. For each pair, the collision frequency has to be evaluated. This is performed by calculating how many iterations of one thread would fit into the minimal possible computational time of another thread. The iteration execution time of the later thread is calculated by simulating a situation in which it will be the last one in the queue to receive access to the critical section. Therefore, the thread will have to wait for the critical section to be released by all other threads in the queue (see Equation (8)).
As in the two-thread case, Equation (6) is used to obtain the collision coefficient, which defines what part of iterations will have a collision or will be free of collisions.
When more than two threads are used, each thread may have collisions with more than one other thread in the same iterations. Therefore, the collision coefficient for thread X is the maximum collision coefficient between thread X and all the other threads (see Equation (9)).
Accordingly, Equation (7) has to be changed (see Equation (10)) to obtain a more universal form suitable for any number of threads, rather than two only.
The proposed WCET estimation model takes into account the proportion between computational and critical sections in one specific thread and the proportion of threads' computational block size to all the other threads. The model requires multiple calculations to obtain all interactions between different threads; however, the calculation is not costly and its implementation and execution complexity are much easier in comparison to discrete event modeling.

Validation of the Proposed Model
The proposed WCET model was validated by extending the experiment, described in Section 3.2. The same parameters were used to calculate WCET values with the proposed model. The experiment with two-thread results showed the proposed model presents the WCET values more realistically (see Figure 3). In the analyzed situation, the proposed model allowed the reduction of the overestimation of the WCET up to 10 times in situations when the size between computational times of two threads was very different.
As in the two-thread case, Equation (6) is used to obtain the collision coefficient, which defines what part of iterations will have a collision or will be free of collisions.
When more than two threads are used, each thread may have collisions with more than one other thread in the same iterations. Therefore, the collision coefficient for thread X is the maximum collision coefficient between thread X and all the other threads (see Equation (9)).
Accordingly, Equation (7) has to be changed (see Equation (10)) to obtain a more universal form suitable for any number of threads, rather than two only.
The proposed WCET estimation model takes into account the proportion between computational and critical sections in one specific thread and the proportion of threads' computational block size to all the other threads. The model requires multiple calculations to obtain all interactions between different threads; however, the calculation is not costly and its implementation and execution complexity are much easier in comparison to discrete event modeling.

Validation of the Proposed Model
The proposed WCET model was validated by extending the experiment, described in Section 3.2. The same parameters were used to calculate WCET values with the proposed model. The experiment with two-thread results showed the proposed model presents the WCET values more realistically (see Figure 3). In the analyzed situation, the proposed model allowed the reduction of the overestimation of the WCET up to 10 times in situations when the size between computational times of two threads was very different.  By inspecting how the proposed model meets the brute-force modeled situation with more than two threads, an analysis of three thread and four iteration loops for each of them was executed. Five possible variants of computational and critical section blocks for each thread was modeled by using discrete event modeling (see Figure 4a), calculated by using our proposed WCET model (see Figure 4b), and calculated by using theoretical WCET model, based on the model by Ozaktas et al. [23] (see Figure 4c). model based on Ozaktas et al. study (Figure 4c), all WCET values are identical, even though the proportion between computational blocks and this situation is equal to 80 cycles, while the modeled value has a different pattern, and the WCET value varies from 72 to 75 cycles (Figure 4a). The results of our proposed model are in between those two solutions (Figure 4b), where the WCET values vary from 74 to 80 cycles. It does not mimic the pattern of modeled WCET, however, reduces the overestimation of the theoretical model. Results of this experiment show that the proposed WCET model is closer (the overestimation varies from 1% to 10% and on average is 4%) to the brute force modeled results and more closely mimics the distribution of the WCET values in comparison to the theoretical model (the overestimation varies from 7% to 11% and on average is 9%).
To evaluate whether the changes are significant in comparison to the theoretical WCET model, t distribution and a significance value (p-value) were used. The difference between the calculated WCET value and modeled execution time value for all 25 cases was analyzed. The difference for the theoretical model is 6.4 (standard deviation equal to 0.866), while for the proposed is 3.253 (standard deviation equal to 2.354). Based on these numbers, the calculated p-value is < 0.0001. This means a significant value and 95% confidence interval of the difference between results of these two models, i.e., the two means are significantly different.  [8;9], and [16;17]. The variations allowed us to evaluate how the WCET depends on the ratio of maximum computational times in the first and second threads (comp_1, comp_2) to the third thread's maximum computational time (comp_3). Critical block size in all threads was constant, and the third thread's computational block size was also constant; therefore, we analyzed 25 different combinations by changing the computational block sizes of thread 1 ad thread 2 (five different values for each of the two threads). In Figure 4, all 25 combinations are represented as intersections of proportion between computational blocks of first and third threads and the proportion between computational blocks of second and third threads. As characteristic to the theoretical model based on Ozaktas et al. study (Figure 4c), all WCET values are identical, even though the proportion between computational blocks and this situation is equal to 80 cycles, while the modeled value has a different pattern, and the WCET value varies from 72 to 75 cycles (Figure 4a). The results of our proposed model are in between those two solutions (Figure 4b), where the WCET values vary from 74 to 80 cycles. It does not mimic the pattern of modeled WCET, however, reduces the overestimation of the theoretical model.
Results of this experiment show that the proposed WCET model is closer (the overestimation varies from 1% to 10% and on average is 4%) to the brute force modeled results and more closely mimics the distribution of the WCET values in comparison to the theoretical model (the overestimation varies from 7% to 11% and on average is 9%).
To evaluate whether the changes are significant in comparison to the theoretical WCET model, t distribution and a significance value (p-value) were used. The difference between the calculated WCET value and modeled execution time value for all 25 cases was analyzed. The difference for the theoretical model is 6.4 (standard deviation equal to 0.866), while for the proposed is 3.253 (standard deviation equal to 2.354). Based on these numbers, the calculated p-value is < 0.0001. This means a significant value and 95% confidence interval of the difference between results of these two models, i.e., the two means are significantly different.
The proposed model was also applied to multiple different situations in a range of brute force-based discrete event modeling tools possibilities in order to estimate whether it guarantees that the real (modeled) WCET value does not increase the calculated WCET value. All situations proved the calculated WCET value is greater or equal to the modeled WCET value and, at the same time, is less or equal to the calculated theoretical WCET value.

Conclusions
Sustainable development encourages the reduction of energy consumption in all possible areas. System programing code execution timing analysis is one of the factors used for system energy consumption modeling and prediction. While information technologies are changing over time, the timing analysis methods must follow to assure accurate energy consumption modeling and prediction results. Multi-thread systems are widely used; however, existing WCET estimation models are not accurate for timing analysis of