Energy Idle Aware Stochastic Lexicographic Local Searches for Precedence-Constraint Task List Scheduling on Heterogeneous Systems

: The use of parallel applications in High-Performance Computing (HPC) demands high computing times and energy resources. Inadequate scheduling produces longer computing times which, in turn, increases energy consumption and monetary cost. Task scheduling is an NP-Hard problem; thus, several heuristics methods appear in the literature. The main approaches can be grouped into the following categories: fast heuristics, metaheuristics, and local search. Fast heuristics and metaheuristics are used when pre-scheduling times are short and long, respectively. The third is commonly used when pre-scheduling time is limited by CPU seconds or by objective function evaluations. This paper focuses on optimizing the scheduling of parallel applications, considering the energy consumption during the idle time while no tasks are executing. Additionally, we detail a comparative literature study of the performance of lexicographic variants with local searches adapted to be stochastic and aware of idle energy consumption.


Introduction
According to the website www.top500.org, accessed on 10 June 2021, in November 2018, the top rank of High-Performance Computing (HPC) system Summit from the Oak Ridge National Laboratory is composed of 2,397,824 CPU. This HPC consumes 9783 kW and achieves a performance of 14.668 GFlops/watt under testing conditions. The final energy consumption is directly related to the quality of the scheduling of the tasks in the HPC system. It is hard to imagine a single scheduling algorithm that solves every kind of task load, which relates to the no-free-lunch theorem [1,2]. Therefore, several scheduling methods have been developed in the literature; in our particular case, we review the approaches when HPC administrators have a restricted time to optimize the final scheduling. To this end, we use the local search approach, which fits for the above statement as in [3][4][5], where the stopping criterion is set to a small amount of fixed objective function evaluations or a small amount of time. Unlike constructive heuristics, which construct the solution, adding one decision variable at a time, or metaheuristics, which require thousands of objective function evaluations (a long-run) [6].
An HPC system is composed of a network of processing units as CPU cores or machines to provide high parallel computing power. These systems are, in most cases, het-erogeneous in their processing capabilities and power consumption. To build HPC with significant computing capabilities is common to combine numerous processing units to the final system. However, adding more processing units or machines to the system increases energy consumption in every aspect of the HPC (network devices, ram, hard drives, etc.) [7][8][9]. From the general reader's perspective, the HPC systems estimate that data centers (a particular case of HPC system focused on data storage) will consume 1/5 of earth's energy consumption by 2025.
This work tackles the precedence-constraint task scheduling of parallel programs over HPC systems formed by heterogeneous machines, minimizing the energy consumption and makespan, both being NP-Hard [10,11]. We pay special attention to the case when machines are not computing any task, but still are powered on (idle time); in our study model of energy for dynamic voltage frequency scaling (DVFS) CPUs [12], we assume that machines consume a minimum fixed amount of energy while idle. Though the nature of this optimization problem is bi-objective, according to [13] the energy consumption during idle times has a strong effect on the energy consumption. Therefore, we present the first study on different stochastic lexicographic local searches for precedence-constraint task scheduling of parallel programs to our knowledge, giving priority to the makespan (tasks computing time) to reduce idle times in machines.
The paper organization is: Section 2 details our studied problem in heterogeneous systems. Section 3 describes the list scheduling principle. A review of the literature on scheduling using local searches appears in Section 4. The experimental settings and our stochastic lexicographic local search variants are presented in Sections 5 and 6, respectively. Section 7 analyzes the experimental results according to the makespan and energy objectives. Finally, Section 9 contains the conclusions and future work in local search and scheduling precedence-constraint tasks on heterogeneous systems.

Problem Description
The HPC system studied in this work, consist of a set of heterogeneous processing units M completely interconnected. Each processing unit m j ∈ M is DVFS capable. Thus, every machine m j operates on a set of multiple voltages. When the voltage is lower than the maximum, machines operate at a fraction of their top speed rs k , the cardinality of the set S of relative speeds is equal to the cardinality of the set V of possible voltages v k inside the machine. The next table shows the set of voltage configurations with their respective relative speed used in our experimentation.
Every machine m j has assigned a DVFS configuration from Table 1. For instances with more than three machines, the remaining use of these configurations are assigned in order, first the number zero, followed by the number one, and so on, in a circular structure. Without loss of generality, we consider the assumptions presented in [14][15][16][17][18][19], and the following: • When a machine is not executing any task (idle time), it uses the lowest voltage possible.

Instances
An instance of the scheduling problem is composed of: (i) directed acyclic graph (DAG), and (ii) the computation time of the tasks in every m j ∈ M. The set of tasks of the parallel program with their precedencies is a DAG. Thus, the program is represented as G = (T, C), where T is the set of tasks and C is the set of communication costs (see Figure 1). A complete instance is formed by G and the computational times P i,j of each task t i in every processing unit m j at maximum capacity, when k = 0 (see Table 2).  Each task t i ∈ T cannot be initiated until all their precedence tasks t j ∈ T and communication have been finalized. For any pair of tasks executed in the same m j , their communication cost is equal to zero.

Solution Representation
Our solution representation consists of two data structures. The first data structure is a matrix of size 2xn (see Table 3). n is the cardinality of T, the first row of the matrix stores the assigned processing unit, while the second stores the value of the k selected energy configuration. The machine configuration map to their respective voltage v k and relative speed rs k from Table 1. Table 3. Configuration machine/voltage.
The second data structure is an array of size n with the priority execution order of the tasks (see Table 4). The execution order is a topological order from the task graph without violated precedence constraints.
The second data structure is not indispensable for scheduling but simplifies the problem's search space because it does not necessarily verify the earliest start and finish tasks' times to compute the makespan. However, this representation has the deficiency that the optimal value of execution time, also known as optimal makespan, may not be in the particular given execution order.
With the representation mentioned above, we detail the objective function to compute the makespan in Algorithm 1 to compute the makespan of complexity O(|T||C|), similar as in [18,20,21]. The computation times of the tasks are calculated as the original computation time at maximum voltage P i,j divided by their relative speed in Table 1. We show the consider computation times P i,j on the following Table. Using the computation times P i,j from Table 5 and a counter time in each machine for the last executed task in the machine when it finishes its execution Time j . Algorithm 1 defines the objective function called makespan.
The function uses the variables: ts i (the starting time of task i), t f j (the finish time of task j), C i,j the communication cost between t i and t j , which is equal to zero in the case the tasks are executed in the same machine. The function is computed in sequence from the first task of the feasible execution order to the last one in the list. Finally, the parallel program makespan is the maximum completion time between the set of tasks. For the energy consumption objective is not necessary the DAG representation, only to know the final makespan. The energy consumption of the tasks is the square power of the selected voltage multiplied by the task completion time in the machine P i,j , to add the machines respective idle energy consumption, it is necessary to compute the machine time during idle, which is equal to the makespan subtracting the tasks computation times. Algorithm 2 shows the pseudocode to compute the energy objective. The energy model derives from the complementary metal-oxide-semiconductor (CMOS) circuits in Equation (1). Where capacitive power P c is computed with the number switches per clock cycle A, the total capacitance load C e f , the supply voltage V, and the frequency f .

Algorithm 1 Makespan objective function
Require: G = (T, C), with an order execution of the tasks O = {o i , . . . , o |T| }, computational costs at relative speeds P i,j , and the consider communication costs C i,j . Ensure: O respect the precedence constraints.
1: Time j ← 0, ∀m j ∈ M 2: for x = 1 to |O| do 3: j ← the index of the machine m j assign to t current

13:
Time j ← t f current

14:
end if 15: end for 16: The Equation (2) is our energy model consumption E c . Which is simplified version grouping the constants A, C e f and f in a single constant K. For practical purposes, the constant K is equal to one in the computed results. Finally, p * i represents the computing task time. Algorithm 2 Energy objective function 1: energy ← 0 2: for each t i ∈ T do 3: k · P i,j 4: end for 5: for each m j ∈ M do 6: idle = makespan 7: for each t i assigned to m j do 8: idle− = P i,j 9: end for 10:

List Scheduling Algorithms
Algorithm 3 shows the pseudocode of the general framework for list scheduling algorithms. In the list scheduling principle, tasks are assigned according to priorities and placed in an ordered decreasing list. First, the tasks are sequenced to be scheduled in accordance with the DAG, respecting their precedences regarding a topological order. Then, each task of the list is assigned successively to a machine. Usually, the machine yields the minimum makespan.

Algorithm 3 List scheduling framework
1: Calculate the priority of each task t i ∈ T 2: Sort the tasks t i into a list L = {t 1 , t 2 , ......, t n } by priority order 3: while L is not empty do 4: Remove the first task from L and assign it to the machine with the best objective value 5: end while

Local Searches Related to the List Scheduling Framework
In [22,23] Arabnejad surveyed the most relevant algorithms using the framework for list scheduling in the literature. Among the most popular is the heterogeneous earliest finish time (HEFT) [24], a deterministic reference algorithm in many studies. It assigns to every task the machine, which allows its most immediate finish time. However, HEFT can modify the task's priority order when detecting available idle time on a machine without precedence-constraint violation. It is out of the scope of the present study to modify the priority order of the tasks. In this paper, we assume the given priority of the tasks is fix. The above makes our objective function in Algorithm 1 feasible in all our studied cases. Recently, several papers regarding energy optimization have used the DVFS technique in several contexts, mathematical programming [25], metaheuristics [26][27][28], heuristics [29][30][31][32][33][34][35][36][37], parallel algorithms [38], among others. However, mainly constructive heuristic methods appear in the literature to schedule tasks in heterogeneous machines. Our perspective is that it could be because of the heavy computational cost of the objective function when no fixed task order execution is given (priority modification as in HEFT). Using the objective function described in Algorithm 1 the complexity is reduced to O(|T||C|) in the worst-case when every task has an equal number of precedences. However, the real complexity is considerably lower because DAGs do not have cycles.
Local searches are heuristic approximation methods that move a current solution to its nearest local optima solution. However, local searches cannot escape local optima as other more advanced approximation methods as metaheuristics.
Some studies investigate the use of isolating local search improvements; we found examples in [6,13,55,56]. To produce a complete experimental study, we choose to compare and adapt the more relevant ones from the above-mentioned local search works. To be lexicographic and stochastic.

Literature Review
In this section, we describe three relevant local search works in the state-of-the-art. A deterministic local search using an aggregation objective function to minimize the makespan and energy [19]. Two stochastic local searches using the best improvement pivot method with lexicographic importance of the objectives [13]. A stochastic local search using two neighborhood operators [6]. The selected works represent the broad ideas on the state-of-the-art scheduling precedence-tasks using local searches.

Energy Conscious Scheduling (ECS)
The energy conscious scheduling (ECS) is a heuristic using a special objective function formulation called relative superiority (RS) [19]; it consists of two phases. The first phase optimizes RS's value in a given topological order (b level used in the original paper) the possible machines m j ∈ M with their respective voltages. The second phase uses the makespan-conservative energy reduction (MCER) technique [19], which visits the tasks in the same topological order. Which is considering the energy consumption in idle times, the variant is called ECS +idle , the one used for the experimental comparison. The RS computation for ECS +idle is shown in Equation (3).
In the original paper, the RS computation for ECS +idle is a little bit different. The negative sign of the case when t f i (m j , v k ) ≥ t f i (m , v ) its outside the entire equation, this has been corrected in Equation (3) to compute the correct results.
The Algorithm 4 shows the pseudocode for the ECS +idle heuristic. At the constructive first phase the objective functions are partially evaluated until the current evaluated task t i , t f i for makespan, and E 1≤i for energy. At the MCER second phase, the objective functions evaluate the complete scheduling and visit neighbor solutions with different machine and voltage configurations. Neighbor solutions that do not increase the makespan and improve energy consumption become the current schedule.

Random Problem Aware Local Search (rPALS)
rPALS [6] is a stochastic local search for makespan minimization on heterogeneous machines. It is a stochastic version of the deterministic local search PALS [57] for DNA fragment assembly. rPALS achieves the best performance against other list-based heuristics, namely: sufferage, min-min, and pµ-CHC. rPALS uses two neighborhood operators swaps and moves, similar to the principle of the variable neighborhood search (VNS) [58], without the stop criterion of finding a local optimal.
The Algorithm 5 shows the pseudocode for the rPALS local search. It starts with an initial Schedule constructed by the minimum completion time (MCT) heuristic [59]. Considering the tasks in any order, MCT assigns each task to the processing unit, which minimizes the finish time of the task.
Main loop iterates until it reach a maximum number of steps MAXSTEPS; at each iteration, a machine m j and a neighborhood operator between swap and move are select randomly.
In the case where the operator swap is selected, it starts a loop selecting a random task t i from the ones assigned to m j , and a random machine m swap = m j until MAXSWAPS is reached. Then a second inner loop iterates selecting random tasks t swap assigned to m swap for swapping with t i and m j , if the neighbor solution Schedule improves the overall makespan, it is assigned at the current best solution Schedule.
In the case the operator move is selected it starts a loop, selecting a random machine m move = m j until the stop condition MAXMOVES is reached. This loop iterates selecting a random task t move assigned to m move , producing a neighbor solution Schedule by assigning the task t move to the machine m j . If the neighbor solution improves the makespan, the solution Schedule is assigned as the current best solution Schedule.

BEST_RT_MVk and BEST_RMVk_T
Our last consideration is of work from the literature is [13], where the authors propose two best improvement stochastic local searches: BEST_RT_MVk and BEST_RMVk_T. Both local search methods start with a solution constructed by the fast heuristic HEFT [24]. The main difference between the proposed local search methods is in their stochastic selections. BEST_RT_MVk randomly selects a task to be evaluated within all the possible machines and voltages configurations. Although BEST_RMVk_T randomly selects a machine and voltage configurations to be considered within all the possible tasks. The Algorithms 6 and 7 shows the pseudocode for BEST_RT_MVk and BEST_RMVk_T, respectively.
The original methods in [13] are stochastic local searches (SLS) for makespan and energy optimization. However, every time the methods found a local improvement, the algorithm restarts the stop condition. The above could produce significant computation times when the algorithm does not search a local optimum, an issue presented in our preliminary experimentations. swaps + + 20: end while 21: tasks + + 22: end while 23: end if 24: if operator == move then 25: moves ← 0 26: Select a random machine m move (m j = m move ) 27: while moves < MAXMOVES do 28: Select a random task t move assigned to m move

Algorithms in Comparison
In the experimental comparison, two mandatory restrictions must be satisfied: • The use of a fixed maximum number of neighbor solutions visited; • Providing only one random initial solution.
The algorithms in Section 4 are adjusting in the following manner.

ECS +idle Stochastic Local Search
The original proposed ECS +idle is a deterministic heuristic. To investigate its effects as a stochastic lexicographic local improvement method, we proposed two new stochastic local search (SLS) using the RS objective function in Equation (3). We follow the best improvement pivot rule in [13], and the two SLS variants in the paper: random selection of task (ECS_RT_MVk) and random selection of machine and voltage (ECS_RMVk_T).
Algorithm 8 shows the procedure for the SLS ECS_RT_MVk. The function's input is a feasible Schedule, the search select a random task t i to be evaluated in all the possible machines in M with their respective voltage configurations if the stopping condition has not been reached. The task t i is assigned to the best machine m and voltage v according to the RS objective function. The main loop iterates over random tasks until the counter step reaches the maximum number of visited neighbors MAXSTEPS. for each m j ∈ M do 8: for each v k ∈ m j do 9: if step < MAXSTEPS then end while 20: end function Algorithm 9 shows the procedure for the SLS ECS_RMVk_T. The function's input is a feasible Schedule, the search select a random machine m j with a feasible random voltage v k inside the range of the machine m j to be evaluated in all the possible tasks in T if the stopping condition has not been reaching. The task t i is assigned to the best machine m and voltage v according to the RS objective function. The main loop iterates over random tasks until the counter step reaches the maximum number of visited neighbor solutions MAXSTEPS.

rPALS Lexicographic Local Search
The original rPALS was proposed only for the improvement of the final makespan. To improve both objectives (energy and makespan), we follow the same lexicographic criteria to accept variable changes in the solutions as in [13]. If the makespan is not worsening and there is an energy improvement, the neighbor solution becomes the current best Schedule. The inner loops from rPALS are removing to control the visit neighbors. The following pseudocode shows the procedure for our proposed LS rPALS Lexicographic Algorithm 10 (rPALS_Lex). if mkspan(Schedule ) ≤ mkspan(Schedule) then 16: if E(Schedule ) < E(Schedule) then 17: Schedule ← Schedule 18: end if 19: end if 20: end if 21: if operator == move then 22: Select a random machine m move (m j = m move ) 23: Select a random task t move assigned to m move

Fixed iterations BEST_RT_MVk and BEST_RMVk_T
For a fair experimental comparison, the commented lines in Algorithms 6 and 7 with the legend Line removed for the experimental comparison are not considered. Therefore the LSs are not initialized with the HEFT heuristic and visits the same number of neighbor solutions, fixed by their main loops stop criterion (MAXSTEPS).

Table of Notations Used in the Described Literature
The Table 6 shows the common mathematical notations in the mentioned literature.

Experimental Setup
The reviewed works from Section 4 differ in their stop criteria. In the original publication of BEST_RMVk_T and BEST_RT_MVk, the maximum number of neighbor solutions visited is variable, a maximum of 100 continuous neighbor visited without improvements. In the work of rPALS, the maximum number of visited neighbors is 4,000,000. Although, in the deterministic search, ECS idle the number is |T| · |M| · |V|.
On the one hand, few visited neighbors could increase the variance at the final computed results. On the other hand, a high number of visited neighbors increase computational times. We suggest a reasonable fixed number of visited neighbors; we use the ratio between the maximum number of visited neighbors in [6] and the maximum number of visited neighbors without improvement in [13]. Giving a total number of maximum visited neighbors of 4,000,000 100 = 40,000. The set of machine and voltage configurations used for the experimentation has a cardinality of six, the ones in [13]. When a particular scheduling problem considers more than six machines, the first configuration is assigned to the next machine, later the second, and so on (round-robin rule) until every considered machine has a valid configuration. Finally, we perform 60 independent executions for every considered scheduling problem.

Parallel Instance Set
The applications in the experimental set are: We used the set of instances from [18], consisting of 400 different scheduling problems.

Friedman Statistical Test
In order to assess the performance of this algorithms it is necessary to validate their outcomes through a statistical test. There are specific statistical tests for comparing two samples and others for more than two samples. A widely accepted statistical test to find significant differences between more than two samples is the analysis of variance (ANOVA) test. However, to use ANOVA it is mandatory that the data follow a normal distribution. There are other statistical tests that do not need to follow such an assumption, they are non-parametric tests. Among those tests, Friedman [63] is a non-parametric test used to compare the performance of several algorithms' performance. The Friedman statistical test is computed using the following equation when no ties: Where the number of independent executions is n, the number of algorithms is k, and the rank of each i algorithm is R i . Once the statistical F r is computed, a reference table is consulted for the achieved p-value [63]. A direct performance evaluation metric in many algorithms' studies is the Friedman rankings [64] (R i ). The original data of the independent executions is transformed into a table of places (ranks) according to the performance of each algorithm (see Table 7). Table 7. Example of ranks for three algorithms.

Execution
Algorithm 1 Algorithm 2 Algorithm 3   1  3  2  1   2  3  2  1   3  1  2  3   4  3  2  1   5  3  1  2   6  3  1  2   7  3  1  2 With Table 7, the Friedman ranking for the Algorithm 1 is compute as the sum of their ranks: R Algorithm1 = 3 + 3 + 1 + 3 + 3 + 3 + 3, R Algorithm1 = 19. The presented ranks in this work are in terms of the average of makespan and energy consumption. For our computational example, R Algorithm1 = 2.71. This technique is a straightforward way to compare the performance of several algorithms over benchmarks. Using the tool in [65] the process is as simple as input the original data in CSV format, using the command line java Friedman data.csv > output.tex, the output.tex file has to be compiled with a compiler for L A T E X.

Task Priority Methods
We evaluate two widely used methods to generate a priority task list in our experimentation: the bottom level (b-level), and the top-level (t-level).
The b-level computes the critical path in the DAG from a current task node t current to the final task in the parallel program t n , taking into consideration the mean of the computing times p i in the machines (See Table 8) and the communication costs (edges of the DAG in Figure 1). For the DAG in Figure 1 the b-level values of the tasks are shown in Table 9. The tasks' priority is according to their b-level values in descending order. In a very similar manner, the algorithm calculates the t-level, finding a path from an actual task node t current to the initial task in the parallel program t 0 (See Table 10). The assigned priorities are according to their t-level values in ascending order. The computation of the b-level and t-level needs to verify every possible path in the DAG; a straightforward way to do it is to use recursive functions as shown in the Algorithms 11 and 12. for (t current , t u ) ∈ E do 5: temp ←B-LEVEL(t u , (temp + (t current , t u ) + pi[t u ])) 6: if max < temp then 7: max ← temp for (t u , t current ) ∈ E do 5: temp ←T-LEVEL(t u , (temp + (t u , t current ) + pi[t u ])) 6: if max < temp then 7: max ← temp 8: end if 9: end for 10: return max 11: end function

Results
This section analyzes the computed results with the experimental settings in Section 6. Due to the considerable number of tables needed to present results, we decide not to include them in the final paper.
A more acceptable and brief way to examine the results is to produce Friedman's average rankings [64] from the non-parametric Friedman statistical test (see Section 6.2). The ranking gives an insight into the algorithm's performance (the lower the ranking, the better). We compute the Friedman ranking using a free tool presented in [65]; the website has extensive information on the Friedman test and rankings.
We found at every presented comparison in this work that there is a statistical difference after analyzing all the computed p-values by the Friedman statistical test, which satisfies the condition p-value ≤ 0.05, giving a statistical significance of 95% [63]. In the following subsections, we present the results by goal objective.

Makespan Results
First, we start by analyzing the computed results according to the makespan objective. Table 11 shows the Friedman ranking when considering the whole scheduling cases and the b-level priority execution.  Table 11 shows that ECS_RMVk_T gets the best performance when using the b-level priority execution while rPALS_Lex is the second best. Notice that both Local Searches, when selecting a random task to verify their machine and voltage possible configurations (ECS_RT_MVk, BEST_RT_MVk), perform the worst. The same algorithms produce low performance when using the t-level priority, as shown in the ranking from Table 12. However, in this case, rPALS_Lex obtains the best performance, followed by BEST_RMVk_T. We infer from the results from Tables 11 and 12 that the worst strategy for makespan optimization is when local searches evaluate from a random task their best neighbor among all their possible machine and voltage configurations. To give more insight into the results, we compute the Friedman average ranking on the individual subsets of instances: Fpppp, LIGO, Robot, and Sparse. The bests rankings in the table are highlighting in gray.
The Friedman rankings in Table 13 confirms the reports from Table 11, regarding the use of b-level, the worst performance is for the ECS_RT_MVk and BEST_RT_MVk local searches. ECS-RMVk-T achieves the best performance with three best average rankings (Fpppp, LIGO, Sparse) and one second-best (Robot). A highly competitive local search when using b-level priority is rPALS-Lex with three second-best average rankings (Fpppp, LIGO, Sparse) and one best (Robot). Considering only the Robot subset of instances, the local search BEST-RMVk-T achieves the best makespan average ranking. The results in Table 14 also confirms the results from Table 12, according to the use of t-level, rPALS-Lex achieves the best performance with three best average rankings (Fpppp, LIGO, Sparse) and one second-best (Robot) followed by BEST-RMVk-T with one best average ranking (Robot) and three second-best (Fpppp, LIGO, Sparse). In contrast, the worst performance is the b-level case with ECS_RT_MVk and BEST_RT_MVk.

Energy Results
In the case of the energy objective, the order of the rankings of the algorithms when using b-level (See Table 15) and t-level (See Table 16) is the same. As in the makespan results section, the worst achieved performance was obtained by ECS_RT_MVk and BEST_RT_MVk. For energy optimization, the best achievable performance is by rPALS_Lex followed by BEST_RMVk_T with an equal number of best and second-best rankings in Tables 17 and 18.

Research Findings
A relevant result emerges from our empirical experimentation, the priority order technique in list scheduling may significantly change the performance of some algorithms, according to the makespan objective. The above is the particular case of the ECS function, which changes from first place (ECS_RMVk_T) in the b-level priority results to third place in the t-level results. A deeper examination of the ECS function in Equation (3) shows that if the energy consumption magnitude is significantly greater than the makespan magnitude, it will emphasize the energy objective over the makespan. According to the energy objective, different priorities for the tasks do not change the comparative performance of the algorithms, for the Friedman ranking remaining the same for the algorithms. Therefore, we infer that energy optimization is more sensitive to the DVFS configurations than the tasks' order of execution.

Conclusions and Future Work
As far as we know, we present the first study of stochastic lexicographic local searches for precedence-constraint task scheduling on heterogeneous machines. We adapt three local searches from the literature to be stochastic, iterative, and lexicographic bi-objective (makespan and energy). The above produces three new variants: ECS_RMVk_T, ECS_RT_MVk, and rPALS_Lex. The experimental results show rPALS_Lex as the most competitive algorithm for makespan and energy optimization compared with the other local searches in the experimentation.
The relative superiority objective function from the ECS heuristic works slightly better when using the b-level priority of execution in the tasks, denoted by ECS_RMKV_T best performance achieved in makespan.
The worst strategy for the studied local searches is when an individual random task is selected to verify their possible machine and voltage configurations. Therefore, we recommend the approach when a random machine and voltage configuration is selected and verifies for improvements over the whole set of tasks.
Finally, as the experimental comparison shows, the task's execution order is essential for the final performance of the algorithms, with radical place changes in the Friedman rankings when using the b-level and t-level priority.
As future work, we would like to study the design of new priority task heuristics, which could improve the performance of the proposed local searches for scheduling.