Energy-Efﬁcient Scheduling of Periodic Applications on Safety-Critical Time-Triggered Multiprocessor Systems

: Energy optimization for periodic applications running on safety/time-critical time-triggered multiprocessor systems has been studied recently. An interesting feature of the applications on the systems is that some tasks are strictly periodic while others are non-strictly periodic, i


Introduction
Multiprocessor architecture such as Multi-Processor System-on-Chip (MPSoC) are increasingly believed to be the major solution for an embedded cloud computing system due to high computing power and parallelism. The multiprocessor architecture of an MPSoC incorporates multiprocessors and other functional units in a single case on a single die. Meanwhile, there is an ongoing trend that diverse emerging safety-critical real-time applications, such as automotive, computer vision, data collection and control applications are running simultaneously on the MPSoCs [1]. For these safety-related applications, it is imperative that deadlines should be strongly guaranteed. Due to the strong timing requirements and needed predictability guarantees, real-time cloud computing is a complex problem [2]. To satisfy the timing requirement, task scheduling typically relies on an offline schedule based on the In this paper, we study the energy-efficient scheduling problem arising from the requirements of safety-related real-time applications when deployed in the context of cloud computing embedded platforms. We focus on the problem of static scheduling multiple periodic applications consisting of both strictly and non-strictly periodic tasks on safety/time-critical time-triggered MPSoCs for energy optimization by employing the two powerful techniques: DVFS and PMM. To reduce energy consumption more effectively, both strictly and non-strictly periodic tasks in time-triggered applications should be correctly addressed. This requires an intelligent scheduling that can capture the strict periodicity of specific tasks. Moreover, the problem becomes more challenging when scheduling for energy minimization by combining DVFS with PMM has to consider periodicity of specific tasks in time-triggered systems. In addition, the energy-efficient scheduling problem becomes more complicated as the number of applications running extends from single to multiple. Our main contributions are summarized as the following: • We consider the unique feature of periodic applications that not all tasks within the applications are strictly periodic in time-triggered systems. A practical task model that can accurately characterize the periodic applications is presented and an energy-efficient scheduling problem based on the model is formulated. • To solve the problem, we present an improved Mixed Integer Linear Programming (MILP) formulation utilizing the flexibility of non-strictly periodic tasks to reduce unnecessary energy overhead. The MILP method can generate the optimal scheduling solutions. • To overcome disadvantage of the MILP method when the size of the problem expands, we further develop a heuristic method, named Hybrid List Tabu-Simulated Annealing with Fine-Tune (HLTSA-FT), which integrates the list-based energy-efficient scheduling and tabu-simulated annealing with a fine-tune algorithm. The heuristic can obtain high-quality solutions in a reasonable time. • We conduct experiments on both synthetic and realistic benchmarks. The experimental results demonstrate the effectiveness of our approach.
It is worth mentioning that, based on the static energy-efficient deterministic schedule (defined in a static configuration file) generated by our proposed methods, the operating system kernel applies it to schedule the partition at its assigned time slot for designing of a practical safety/time-critical partitioned system, where the middleware is integrated to ease interoperability and portability of components to satisfy requirement regarding cost, timeliness, power consumption and so on [24,25].
The remainder of this paper is organized as follows: Section 2 reviews related work in the literature. Section 3 describes models and defines the studied problem. In Section 4, we give a motivating example to explain our idea. Our approach is presented in Section 5. Experimental results are provided in Section 6. The conclusions are presented in Section 7.

Related Work
Scheduling for energy optimization is a crucial issue in real-time systems [2,4]. Energy-efficient scheduling of the DAG-based application on the systems have been extensively studied. To name a few, Baskiyar et al. combined DVFS and decisive path scheduling list scheduling algorithm to achieve two objectives of minimizing finish time and energy consumption [6]. Liu et al. distributed the slack time over tasks with the DVFS techniques on the critical path to achieve energy savings [8]. However, they are merely for single DAG-based application. Moreover, these approaches only consider dynamic power consumption, and ignore static power consumption that becomes prominent in the deep submicron domain. In energy-harvesting system, Qiu et al. were devoted to reducing power failures and optimizing the computation and energy efficiency [26], and the authors in [27] addressed the scheduling of implicit deadline periodic tasks on a uniprocessor based on the Earliest Deadline First-As Soon As Possible (EDF-ASAP) algorithm. The works in [28,29] combined DVFS and PMM to minimize energy consumption for scheduling frame-based tasks. However, their approaches can only address independent tasks in a single-processor system. Kanoun et al. proposed a fully self-adaptive energy-efficient online scheduler for general DAG models for multicore DVFS-and PMM-enabled platforms [9]. However, the proposed energy-efficient scheduling solution is designed for soft real-time tasks, where missing deadlines is tolerable.
The aforementioned studies are not applicable for safety-critical applications that have the highest level of safety. Furthermore, in these studies, an application is only periodic in terms of its release time and each task within the application can start aperiodically. Obviously, such an assumption is untenable for a time-triggered application in safety/time-critical systems. The scheduling of tasks in time-triggered systems have also been reported in [13,14,[18][19][20]. Lukasiewycz et al. obtained a schedule for time-triggered distributed automotive systems by a modular framework which provided a symbolic representation used by an ILP solver [13]. Sagstetter et al. studied the problem of synthesizing schedules for the static time-triggered segment for asynchronous scheduling in current automotive architectures, and proposed an ILP approach to obtain optimal solutions and a greedy heuristic to obtain high quality solutions [18]. Freier and Chen presented the time-triggered scheduling policies for real-time periodic task model [19]. Gendy introduced techniques to automate the process of searching for a workable schedule and increase the system predictability [20]. Unfortunately, these works only focus on enhancing system performance.
Research efforts devoted to task scheduling for energy optimization in time-triggered embedded systems have received attention recently. Chen et al. presented ILP formulations and developed two algorithms to address the energy-aware task partitioning and processing unit allocation for periodic real-time tasks [15]. However, the work only addresses independent tasks, and it is not suitable for the DAG-based applications. For periodic dependent tasks, Pop et al. proposed a constraint logic programming-based approach for time-triggered scheduling and voltage scaling for low-power and fault-tolerance [16]. Recently, the state-of-the-art work in [17] introduced a key technique to model the idle interval of the cores by means of MILP. The study proposed a time-triggered scheduling approach to minimize total system energy for a given set of applications represented as DAGs and a mapping of the applications. However, the studies all assume that each task and its instances are started in a strictly periodic pattern. In reality, besides the strictly periodic tasks within time-triggered applications, there also exist non-strictly periodic tasks where each instance of a task does not need to be started periodically [21][22][23]. To the best of our knowledge, Ref. [21] is the first study that tried to derive better system performance with scheduling both strictly and non-strictly periodic tasks in the safety-critical time-triggered systems. However, their work only focuses on enhancing schedulability, and energy optimization is not involved. In this paper, we address the energy-efficient scheduling problem for periodic time-triggered applications consisting of both strictly and non-strictly periodic tasks.
Methods for scheduling applications on time-triggered multiprocessor systems are mostly based on mathematical programming techniques [12][13][14][15][16][17][18]. Since scheduling in multiprocessor systems is NP-hard, many heuristics have been developed when the scale of the problem is increased. To schedule multiple DAG-based applications on real-time multiprocessor systems, the studies in [6,8,21,[30][31][32][33][34] presented a collection of static greedy scheduling heuristics based on list-based scheduling. The list-based scheduling heuristics are generally accepted and can provide effective scheduling solutions and its performance is comparable with other algorithms at lower time complexity. They efficiently reduce the search space by means of greedy strategies. However, due to the greedy nature, they can only address certain cases efficiently and cannot ensure the solution quality for a broad range of problems.
On the other hand, to explore the solution space for a high-quality solution, current practice in many domains such as job shop scheduling [35], autonomous power systems [36], distributed scheduling [37] and energy-efficient scheduling problems in embedded system [38][39][40], favors Tabu Search (TS)/Simulated Annealing (SA) meta-heuristic algorithms. They have shown superiority to the one-shot list scheduling heuristics, despite a higher computational cost. However, both TS and SA have advantages and disadvantages. In general, the SA algorithm is problem-independent, which is analogous to the physical process of annealing. However, it does not keep track of recently visited solutions and needs more iterations to find the best solution. TS algorithm is more efficient in finding the best solution in a given neighborhood, whereas it cannot guarantee convergence and avoid cycling [35]. Moreover, the algorithms cannot be directly used to solve our problem since the non-strictly periodic tasks in time-triggered applications are ignored (whether a task starts strictly periodic or not has a strong influence on scheduling and total energy consumption of the whole system).
To the best of our knowledge, the heuristic method for our problem is not yet reported. In this paper, we consider to solve the problem by formulating the MILP model to obtain optimal solutions, and further to develop an efficient heuristic algorithm since computation time of the MILP method is intolerable when the problem size increases.

Problem Formulation
In this section, we first introduce related models and basic concepts that will be used in the later sections, and then provide the problem formulation. The notations used in this paper and their definitions are listed in Table 1.

Symbol
Description The p-th instance of task v i executed on core m Cv i,j Communication task (v m i transfer data to v n j ) C i,j Communication time between v m i and v n j comm(v i , v j ) Communication size between task v i and v j D k Deadline of application g k P k Period of application g k d i Deadline

System Model
In this paper, we consider a typical MPSoC architecture [17,41,42] shown in Figure 1. The MPSoC architecture consists of M processing cores {core 1, core 2, ..., core M}. Each core has its own local memory, and all cores perform inter-core communication by a high bandwidth shared time-triggered non-preemptive bus to access the main memory. The multi-core platform supports L different voltage or frequency (v/f) levels and a set of total power value {P a1 , P a2 , . . ., P aL } (P a1 > P a2 > . . . > P aL ) corresponding to v/f levels. The bus controller implements a given bus protocol (e.g., time-division multiple access protocol), and assigns bus access rights to individual cores. The communication procedure among inter-core rely on message-passing [30]. The characteristics of the system model are as follows: (1) a DVFS-and PMM-enabled MPSoC; (2) non-preemptive; (3) shared time-triggered bus based on a given protocol; (4) communications are supposed to perform at the same speed without contentions; and (5) each core has independent I/O unit that allows for communication and computation to be performed simultaneously. Note that the real communication cost occurs only in inter-core communications where dependent tasks mapped on different cores. In addition, when the tasks are allocated to the same core, the communication cost becomes zero as the intra-core communication can be ignored.

Task Model
An application can be modeled by a DAG (or called task graph) comprising a set of dependent nodes connected by edges. This article assumes a periodic real-time task model in which G = {g 1 , g 2 , . . . , g K } is a set of K applications to be executed on the MPSoC. Application g k ∈ G is denoted as g k = {V k , E k , D k , P k }, where V k and E k are set of nodes and edges in g k , respectively, and D k and P k are deadline and period of g k , respectively. The deadline D k is assumed to be a constrained deadline, i.e., it is less than or equal to the period P k . Tasks in g k share the same period and deadline of g k . We use H a to describe the least common multiple of the periods of all tasks, which is called the hyper-period. It is well known that scheduling in a hyper-period gives a valid schedule [43].
In a task graph, each node v i ∈ V k denotes a computation task and each edge e j ∈ E k represents a communication task. The computation tasks complete data computation on the processing cores, and the communication tasks assigned to the bus complete data transmission between the cores. Computation and communication tasks can be performed in parallel since the communication operation is non-blocked. The weighted value on the edge indicates the amount of data transferred between connected computation tasks. The worst case execution time (WCET) of a task v i on a core m under v/f level l is denoted by W m i,l . These profiling information of tasks can be obtained in advance. On the MPSoC platform, we consider multiple time-triggered applications which are released periodically. As not all tasks within the applications started strictly periodically, we analyze characteristics of the tasks and make a classification of tasks in an application as follows: 1. Strictly periodic task: the task in an application should strictly start its instances periodically, which means that the start time interval between two successive task instances is fixed. As a strictly periodic task v 1 shows in Figure 2a, in addition to release time and deadline for the application, the start time of different invocations of the task need to also be periodic. 2. Non-strictly periodic task: the task in an application need not start its instances periodically, i.e., the start time interval between two successive task instances is not fixed. As a non-strictly periodic task v 2 shows in Figure 2b, the start time of different invocations of the task can be aperiodic as long as the deadline of the task can be guaranteed. In this paper, according to the existence of the non-strictly periodic task, an application is regarded as exactly periodic if all tasks within the application are strictly periodic; otherwise, it is regarded as a loose periodic application.

Energy Model
Assuming that the MPSoC supports both DVFS and PMM. In this paper, we adopt the same energy model due to its generality and practicality [17,42,44,45]. The total system energy consumption is composed of energy overhead of communication and computation. We assume each processing core has three modes, active, idle and sleep mode, and the shared bus has two modes, active and idle mode. Various practical issues including time and energy overhead of the core mode switching and inter-core communications are also considered in the energy model. We apply inter-task DVFS [5,17,41] technique, where the supply v/f of the core cannot be changed within single task. The dynamic power consumption of a processing core and operation frequency f are given by: where C e f is the effective switching capacitance, V dd is the supply voltage, k is a circuit dependent constant and V t is the threshold voltage. The static power consumption, P s , is given by: where I leak denotes leakage current. Therefore, total power consumption when the core is active under v/f level l can be computed as: where P on is the intrinsic power that is needed to keep the core on. Thus, the energy consumption of task v m i executed on core m at v/f level l can be represented as: When a core does not execute any task (idle mode), its power consumption is primarily determined by the idle power. We assume that P idle and P sleep respectively represent the idle power and sleep power. Normally, we have P idle > P sleep . Considering the overhead of switching the processing core between active mode and sleep mode, the definition of break-even time T bet is defined as the minimum time interval for which entering the sleep mode is more effective (energy-wise) when compared to the idle mode, despite of an extra time and energy overhead associated to the mode switch between active mode and sleep mode. In other words, the core should keep in idle mode if the idle interval t idle < T bet ; otherwise, the core should enter into sleep with power consumption P sleep . Similar to [17], T bet can be calculated as: where t ms and E ms are time and energy overhead of the core mode switching, respectively. The energy consumed in idle mode (E idle ) and sleep mode (E sleep ), are calculated respectively as follows: Therefore, given a static time-triggered schedule S, the total energy consumption of the processing core is: The processing core, the bus, and the shared on-chip memory in the architecture complete the data transfer between the two dependent tasks. Specifically, an inter-core communication is issued when two tasks with data dependence are mapped to different processing cores. In addition, the shared on-chip memory stores the intermediate communication. The processing core can initiate a write operation to the shared on-chip memory by providing an address with control information that typically requires one bus clock cycle. The communication time overhead (or latency) refers to the length of time that a message containing multiple words delivered from a source processing core to a target processing core. In the architecture, only one component (e.g., processing core) is allowed to use the bus actively at any one time according to the characteristics of the shared bus. The communication procedure on the shared bus is non-interruptible, thus multiple communications should be serialized. The communication time overhead is proportional to the data transfer size, i.e., is the amount of data transferred between task v i and task v j , and B is the communication bandwidth [41]. On chip memory will allocate memory space to store intermediate data. The required memory space will be released until the target processing core sends back to the bus controller the successful data transfer. For a task graph, there is no inter-core communication if both the source and the target node of an edge are mapped on the same core. The inter-core communication energy overhead between task v i and v j is calculated as E comm (v i , v j ) = C i,j × P ba , where P ba is the power of active bus.

Problem Statement
Mapping and scheduling in multiprocessor systems have each been proven to be NP-hard. In this paper, we decouple the problem into mapping and scheduling. It is worth mentioning that we assume the task mapping can be performed by using any algorithms in previous excellent works [10,30,31]. The energy-efficient scheduling problem is defined as illustrated in Figure 3. Given a DVFS-and PMM-enabled MPSoC shown in Figure 1, multiple periodic applications consisting of both strictly and non-strictly periodic tasks, task mapping and profiling information as inputs, the energy-efficient time-triggered scheduler is to find a static non-preemptive scheduling and a v/f assignment for each task in a hyper-period H a such that total system energy consumption E total_H a is minimized while timing constraints are guaranteed.

Motivating Example
For easy understanding, in this section, we first present a motivating example to show that state-of-the-art energy-efficient scheduling on time-triggered systems may not work well on the problem. Assuming that an MPSoC has CORE1 and CORE2, each with a high frequency level f H and a low frequency level f L . The total active power of the core under f H and f L is denoted by P aH and P aL . Assuming a set of applications (denoted as g 1 , g 2 and g 3 ) and their task mappings on the MPSoC have been given as illustrated in Figure 4. g 1 is an exact periodic application in which task v 1 , v 2 , v 3 and v 4 are responsible for collecting data from sensors periodically and g 2 is a loose periodic application in which task v 5 , v 6 , v 7 and v 8 are responsible for performing processing data. The edges e 1 , e 4 , e 6 and e 7 indicate their connected tasks are mapped to different cores and the dashed edges e 2 , e 3 and e 5 indicate the corresponding tasks are mapped to the same core. Thus, comm(v 1 , v 3 ), comm(v 2 , v 4 ) and comm(v 5 , v 8 ) are equal to 0. The periods for g 1 , g 2 and g 3 are 60, 30 and 60, respectively. Task WCETs and power profiles are shown in Table 2. For simplicity, time unit is 1 ms, power unit is 1 W, and the energy unit is 1 mJ.
The hyper-period H a is 60 if we schedule g 1 and g 2 . In one hyper-period, g 1 and g 2 are released 1 and 2 times, respectively, as well as each task within its application. Based on the assumptions that the start time interval of any two successive instances of a task must be fixed in previous works [15][16][17], the scheduling for energy minimization is shown in Figure 5a. In the schedule, the horizontal axis represents the time, and the heights of task blocks represent the frequency level. The start time interval between two consecutive task instances (v 2  Scheduling Failed !!! In contrast to the scheduling from Figure 5a, the schedule generated by our method in Figure 5b shows a better scheduling for energy-efficiency that the average power consumption in H a is 0.784 W. In the schedule, the start time interval between task pair (v 2 5,2 , v 2 5,1 ), (v 1 6,2 , v 1 6,1 ), (v 1 7,2 , v 1 7,1 ) and (v 2 8,2 , v 2 8,1 ) in g 2 do not have such the strict constraint of periodicity, as g 2 is a loose periodic application. Due to the flexibility of those non-strictly periodic tasks in the scheduling, increase of total core sleep time and decrease of energy overhead from mode switching can achieve about 8.2% total energy savings compared with Figure 5a.
Assuming that g 3 is a exact periodic application in which v 9 is responsible for data transformation, we then consider to schedule the three task graphs, g 1 , g 2 and g 3 . The hyper-period is still 60. We find that scheduling the task graphs based on the simplistic assumptions in [15][16][17] would fail as shown in Figure 5c, as their methods impose overly strict constraints on the task instances within g 2 . For example, the start time of the two task instances v 1 6,1 and v 1 6,2 are 12 and 42, respectively, such that the start time interval between these two task instances is fixed as 30. Thus, v 9 cannot be scheduled in a hyper period since the size of v 6 is 6 and the size of v 9 is 13. However, there actually exists a feasible schedule as shown in Figure 5d. For the non-strictly periodic task v 6 in g 2 , the constraint regarding the periodic interval between the start time of v 1 6,1 and its next instance v 1 6,2 is unnecessary. In the schedule, the two task instances, v 1 6,1 and v 1 6,2 start at 7 and 42, respectively, and thus v 9 can be scheduled at 17. From the above results, one can observe that the previous studies which do not consider characteristics of non-strictly periodic tasks may result in more energy consumption and even degradation of the schedulability of the whole system. For our problem in this paper, to reduce energy consumption more effectively, a scheduling approach which is aware of the periodicity of specific tasks and utilizes the flexibility of non-strictly periodic tasks is desired.

The Proposed Methods
This section presents the energy-efficient scheduling approach jointly with DVFS and PMM techniques for multiple periodic applications consisting of strictly and non-strictly periodic tasks. With consideration of strictness of tasks' periodicity, we formulate a MILP model to solve the problem and to obtain an optimal scheduling in which the system total energy consumption is the minimum. Then, we develop a heuristic algorithm when the MILP formulation cannot be used to efficiently solve large scale instances.

MILP Method
ILP-based methods have the advantages of reachable optimality and easy access to various solving tools. We aim to find an energy-efficient time-triggered scheduling and a v/f assignment of all tasks in given an MPSoC, multiple DAGs and task mapping, such that the total system energy consumption is minimized under timing constraints. To obtain optimal energy-efficient scheduling solutions for pre-mapped tasks with consideration of strictness of tasks' periodicity, we now develop our MILP formulation for the problem based on the practical models defined in Section 3. We build up our MILP formulation step by step, including v/f selection constraints, deadline constraints, periodicity constraints for the strictly periodic tasks, precedence constraints, non-preemption constraints, and an objective function. Firstly, we define the following variables: Then, given task graphs and task mappings, we formulate the MILP model as follows: Minimize: Subject to 1. Voltage or frequency selection constraints for each task as we use inter-task DVFS: 2. Deadline constraints (χ m denotes time overheads for DVFS and task switch on core m): 3. According to strictness of the tasks's periodicity, we separately determine the start time of these tasks and their instances in H a . Therefore, for the strictly periodic tasks belonging to the time-triggered applications within one hyper-period H a , the periodic constraints can be represented as follows: For any non-strictly periodic task and its instances in H a , the periodic constraint is unnecessary, that is, the interval between the start time of two consecutive instances of the task is no longer fixed as p i . 4. Dependency constraints for computation tasks (e.g., source task v m i,p and target task v m k,r mapped to the same core: 5. Dependency constraints for tasks (e.g., source task v m i,p and target task v n j,q mapped to different cores. (a) Cv m;n i,p;j,q can be started only after v m i,p completes: (b) v n j,q can be started only after Cv m;n i,p;j,q completes: cts m;n i,p;j,q + C i,j + χ m ≤ ts n j,q .
6. Any two computation task instances mapped to the same core must not overlap in time, as well as the communication tasks in the bus. They can only be executed sequentially. Assume task v m s,p and task v m t,q are two task instances, and MAX is a constant far greater than H a . To guarantee either task v m t,q can run after task v m s,p finishes, or vice versa, the non-preemption constraint can be expressed as follows: ts m t,q +Σ L l=1 (x m t,p,l ×W m t,l )+χ m ≤ ts m s,p +MAX×O m s,p;t,q .
The two formulas are also applicable to communication tasks mapped to the bus. The differences between computation and communication tasks are that execution time of computation tasks are variable, while communication time of communication tasks are constant. We can get real computational time of the task on the specified core and real communication cost between the dependent tasks, as task mapping has been given. In addition, communication tasks can be overlapped with the computation tasks independent on them.
To formulate the time interval (int i ) of any two adjacent tasks on each core m in one hyper-period H a , we use the interval modelling technique in [17]. The readers interested in the detailed steps of modeling can refer to [17]. Then, according to the definition of T bet , for each time interval int i on the core, we have where d i refers to a binary variable in the decision array darray[N], representing whether the core should remain idle mode (d i = 0) or enter into sleep mode (d i = 1). Assuming there are N tasks on the core in H a , the total idle and sleep interval (t idle and t sleep ) can be represented as follows: The total energy overheads of mode switch for the core can be calculated as follows: Note that the step function introduced by d i in Equation (19) and the multiplication of int i and binary variable d i in Equations (20) and (21) are nonlinear equations. Such problems can be solved by commercial or open-source ILP solvers after linearization. Solutions to similar problems have been presented in [46]. We now present the linearization process for our problem.
To linearize the multiplication of d i × int i , we define a new variable r i , such that r i = d i × int i . It is obvious that int i ≤ H a . The multiplication can be linearized as the following constraints: The step function introduced by d i in Equation (19) can be transformed to the following constraint: In Equation (26), the multiplication of d i × int i are linearized by using Equations (23)- (25). Based on these formulations, lastly, we can obtain an optimal scheduling and minimum overall energy consumption E total_H a by solving the MILP model with ILP solver.
Limitation of the MILP-based method: Though we can obtain an optimal solution by solving the MILP formulation with modern ILP solvers, it is time-consuming to search the optimal solution for our problem. Specifically, to construct time interval for each task instance in one hyper-period, the time interval modeling in the previously discussed MILP-based method particularly yields a large number of variables, and results in dramatically increased exploration space. The problem may even not be solved because of memory overflow when input size of tasks to be scheduled is large. To address this, we propose an efficient heuristic algorithm to reduce the exponentially increasing scale in Section 5.2.

Heuristic Algorithm
In this section, we develop a heuristic algorithm, named Hybrid List Tabu-Simulated Annealing with Fine-Tune (HLTSA-FT). Different from the TS/SA algorithm mentioned in Section 2, the proposed algorithm has the following innovations: (1) the HLTSA-FT integrates list scheduling with TS/SA to take advantage of both algorithms and to mitigate their adverse effects. Based on our problem, the decomposition and solution process is iteratively guided by the HLTSA-FT algorithm that employs the proper intensification and diversification mechanism. In HLTSA-FT, the SA supplemented with a tabu list can reduce the number of revisiting old solutions and cover a broader range of solutions; (2) list-based scheduling performed in the List-based and Periodicity-aware Energy-efficient Scheduling (LPES) function can efficiently obtain feasible solutions for our problem; (3) in addition, solutions can be further improved by applying problem specific and heuristic information to guide the process of optimization. Specifically, a fine-tune phase performed in the FT function is presented to make minor adjustments of the accepted solution to find a better solution more rapidly. Therefore, the total number of iterations can be reduced and solution quality can be improved. The details of HLTSA-FT are given in Algorithm 1. Three main steps in the algorithm are: Initialization. The step (in Lines 1-3) first sets appropriate parameters including the initial temperature T 0 , the maximum number of iterations LPMAX, the maximum number of consecutive rejections RMAX, the cooling factor δ and the maximum length of the tabu list TL. Then, the algorithm builds TL with length of TLIST_LEN, and sets aspiration criterion A. The initial_solution_gen() function generates an initial solution λ 0 (v/f assignment for each task instance in H a ) as the starting point of optimization process. Since a good initial solution can accelerate the convergence process, the function integrates the MILP model of a relaxed formulation (e.g., by neglecting the idle and sleep interval formulations). λ 0 is evaluated and the current optimal energy consumption is denoted as E cur . The aspiration criterion accepts the move provided that its total energy is lower than that of the best solution found so far. It helps with restricting the search from being trapped at a solution surrounded by tabu neighbors. The tabu list stores recently visited solutions and helps saving considerable computation time by avoiding revisits.
Iteration. In each iteration (in Lines 5-26), the solution_neighbor() function (Line 5) generates neighborhood λ new by applying a small perturbation (swap move) to current solution λ cur . In our context, v/f assignments for the tasks are in a neighbourhood. λ new is generated in two steps: (i) select two tasks; and (ii) swap their v/f levels. Then, λ new is checked for feasibility (i.e., if constraints mentioned above are met) by solution_ f easible function. If the solution is not in the tabu list or satisfies the aspiration criterion, it is selected. Otherwise, a new solution is regenerated. Then, the solution is translated to an energy-efficient schedule by using the LPES function (Line 6). The solution λ new which consumes less energy will always be accepted, and when λ new is an inferior solution, it may still be accepted with a probability Pro(∆E, T) = exp(−∆/T) where T is the annealing temperature at current iteration. This transition probability can help the algorithm to escape from local optima. Once accepted, λ new is put in the tabu list TL and the current solution is updated by replacing λ cur with λ new for next iteration. Then, the FT phase (Line 16) will performed to fine-tune the accepted λ new . Otherwise, the solution λ new is discarded with rjnum plus 1. The algorithm then decreases the temperature and continues to the next iteration.

Algorithm 1: The HLTSA-FT Heuristic Algorithm
Input: DAGs, task mappings and profiles, power profiles Output: An energy-efficient task schedule S, v/f assignments 1 Set appropriate value of T 0 , LPMAX, RMAX, δ, TLIST_LEN; Stopping Criteria. The search procedure will be stopped if the number of iterations or the variable rjnum reaches the predefined value. The variable rjnum stores the current number of continuous rejections, and it represents that no superior solution exists in the neighborhood and the search has reached a near optimal solution once rjnum reaches RMAX.
In the next two subsections, we give a detailed description of the LPES and FT.

LPES
To obtain a feasible scheduling for energy reduction efficiently, the scheduling for our problem needs to addresses two aspects. First, a priority assignment (i.e., execution order of tasks) must satisfy the corresponding constraints (including deadline and precedence constraints for each task graph, and periodicity constraints for the strictly periodic tasks) in the schedule, and maximize the total interval available for energy management. Second, the intervals need to be allocated efficiently to reduce energy consumption. In this paper, we apply the List-based and Periodicity-aware Energy-efficient Scheduling (LPES) method. The first aspect is addressed through bottom level (b-level) based priority assignment. The second aspect is addressed through a modified simple MILP model whose number of variables is only linear with the number of tasks.
List scheduling is a type of scheduling heuristics in which ready tasks are assigned priorities and ordered in a descending order of priority. A ready task is a task whose predecessors have finished executing. Each time, the task with the highest priority is selected for scheduling. If more than one task has the same priority, ties are broken using the strategy such as the random selection method. Priority assignment based on the b-level has been adopted in energy-aware multiprocessor scheduling. The b-level of a task is defined as the length of the longest path from the beginning of the task to the bottom of the task graph. As we focus on multi-DAGs in our problem, we define the b-level of task v m i,p and its next instance v m i,p+1 within H a as: We calculate b-level values of all tasks and their instances, and sort them in a list which is ordered in descending order. The higher the value of b-level, the higher the priority of the task. An example is shown as below: Example 1. Consider the case given in the form of two task graphs g 1 and g 2 in Figure 4. The b-level of tasks in H a are shown in Table 3. Thus, execution order of tasks on CORE1 and CORE2 are denoted as {v 1 7 Based on the given priority and v/f assignment for each task, a scheduling with PMM should be generated to reduce total energy consumption. The time interval can be directly modeled as the following: Assuming that there are N task instances on a core m in a hyper-period H a , all these tasks are stored in a task list represented as T 1 , T 2 , . . . , T i , . . . , T N (1 ≤ i ≤ N) where tasks are ordered in descending order of their priorities. As the tasks in the first hyper-period shown in Figure 6, the time interval between any two adjacent tasks, T i and T i+1 (1 ≤ i ≤ N − 1), can be directly calculated as: where st(T i ) and f t(T i ) denote start time and finish time of task T i , respectively. As we focus on task scheduling in one hyper-period and task execution are repeated in each hyper-period, there are N time intervals. The last time interval int N between task T 1 and task T N is calculated In the time interval modeling in Section 5.1, a large number of intermediate integer variables are used to check timing information of every task instance to determine the closest task instance for a task instance. While compared with the time interval modeling in [17], the number of constraints and many decision variables (e.g., x m i,p,l and O m s,p;t,q ,) and the intermediate variables (e.g., A m s,p;t,q , B m s,p;t,q , and O m s,p;t,q − B m s,p;t,q in [17]) have been greatly reduced. After obtaining each idle time int i , we use the ILP solver to obtain an energy-efficient scheduling.

FT
To find solutions that can further reduce energy consumption, a fine-grained adjustment of the neighborhood range is performed for the accepted solution (line 16 in Algorithm 1). We now present the details of FT phase. The core idea of FT is to increase the potential energy savings by tuning priorities that still satisfy corresponding constraints of task graphs (not blindly or randomly adjusting priorities). In this study, for the strictly periodic tasks, their priorities remain unchanged since the strictness of periodicity of task start time limits the possibility of adjustment within a hyper-period. The non-strictly periodic task instances in H a do not have to follow the strict condition that all tasks need to be started periodically; thus, they have space for execution-order adjustment. On the other hand, the tasks that have the same b-level value (tie-breaking tasks) also have chances to adjust their priorities. We focus on these tasks that may have better schedule flexibility and, correspondingly, make full use of it to achieve more energy savings. The pseudo-code of the FT is listed in Algorithm 2.
In Algorithm 2, firstly, the priority assignments of tasks on each core are recorded according to the accepted solution λ new . Then, the FT keeps strictly periodic tasks unchanged. For tie-breaking and non-strictly periodic tasks, it adjusts and records their priorities in possible priority assignment array pos_priority[] by using priority_adj() function. Next, for each element in pos_priority[], the algorithm performs LPES. In each iteration, the feasible solution λ f ine (checked for feasibility by solution_ f easible function) that can reduce the energy consumption is stored. Finally, the tasks are adjusted iteratively until no improvement can be achieved. Note that FT can be used directly in the optimization process to find an optimal solution quickly if the initial solution is good. The FT scheme is illustrated through the following example.

Example 2.
In Example 1, the initial execution order of tasks on CORE1 and CORE2 are {v respectively. Among them, task v 1 2,1 and task v 1 6,2 are tie breaking tasks, and tasks belonging to application g 2 are non-strictly periodic. Thus, they can be adjusted (swapped) as long as the precedence constraints are guaranteed. The corresponding schedule after FT can be seen in Figure 5b, execution order of tasks on CORE1 and CORE2 are {v 1 7,1 The improvement of power consumption on the system after FT is, therefore, 8.2%.

Experiment Evaluation
This section presents the experimental setup and case studies. To evaluate and demonstrate the efficiency of our proposed approaches, the experiments are performed on a 3.60 GHz 4-core PC with 4 GB of memory under Windows 7. The same 70 nm technology power parameters of the processor are used as in the studies [17,42,44,45]. Code is written in C language, and we use the IBM ILOG CPLEX 12.5.0.0 Solver to solve the MILP formulations. In each case, CPLEX is given a time limit of 10 h.

Experiment Setup
Our experiments include 12 applications represented by task graph TG1-TG12. TG1-TG3 are based on industrial, automotive, and consumer applications [47]. TG4-TG6 are three applications from 'Standard Task Graph' [48], which are based on real applications, namely, a robotic control application, the FPPPP SPEC benchmark and a sparse matrix solver. In addition, we use a general randomized task-graph generator TGFF [49] to create six different periodic applications (TG7-TG12) with typical industrial characteristics in our experiments. These task graphs are from the original example input file (e.g., kbasic, kseries-paralle and robtst) that come from the software package. Then, we consider nine combinations (i.e., five relatively small benchmarks, namely SG1-SG5, and four large benchmarks, namely LG1-LG4) of these task graphs from TG1-TG12. Each benchmark has 2-5 task graphs with features including different topologies (such as chain, in-tree, out-tree, and fork-join), different lengths of critical paths and numbers of dependent tasks. The period of the task graphs are distributed randomly in [10,2000] ms. We define a parameter α varied from [0, 1] for the whole set of tasks in each benchmark, which reflects the ratio between the strictly and non-strictly periodic tasks. In other words, all tasks are strictly periodic if α is equal to 1, and non-strictly periodic if α is equal 0.
We consider a 4-core architecture for our experiment. The power model is based on a typical 70 nm technology processor, which has been applied in the works [17,41,42,44,45]. The accuracy of the processor power model has been verified by SPICE simulation. For fairness of comparison, parameters of cores power, voltage levels and energy overhead of processor mode switch are referred to [17]. As shown in Table 4, the processor can operate at five voltage levels within the range of 0.65 V to 0.85 V with 50 mV steps and the corresponding frequencies vary from 1.01 GHz to 2.10 GHz. The corresponding dynamic power P d and static power P s under different v/f level are calculated according to the energy model in Section 3.3 and the technology constants (e.g., C e f , k, and V t ) from [42,44,45]. The time overhead of processor mode switch t ms and voltage/frequency switch are 10 ms and 600 µs, respectively, from [50]. For the mapping step, we use the task assignment algorithm in [31] to assign each task to the MPSoC.

Experiment Results
This section presents the evaluation of our improved MILP method in Section 5.1 and heuristic algorithm in Section 5.2. The number of tasks and edges of each benchmark is shown in first column of Table 5.  We evaluate and compare our improved MILP method (represented by IMILP) with existing scheduling method (denoted as SMILP) in which the periodic constraint must be strictly followed in the start of all tasks [17]. Table 5 shows average power consumptions in one hyper-period (i.e., the average value of E total_H a divided by H a ) under different MILP-based methods. The results are obtained in three different cases with the factor α varying from 1/4 to 3/4 with step size 1/4.
From Table 5, one can see that SMILP fails to increase energy savings in contrast to our IMILP. Compared with SMILP, the IMILP in case α = 3/4, 1/2 and 1/4 reduces power consumption for small benchmarks SG1-SG5 by, 8.38%, 14.38% and 19.86%, respectively. The average power consumption can be reduced by 14.21%. The results demonstrate that the simplistic assumption in previous SMILP methods where each task and its task instances must strictly start periodically can lead to an increase in energy consumption. On the other hand, column "SMILP" under "Power Consumption (W)" illustrates the power consumption under SMILP in any cases of α remain unchanged. However, the results under IMILP (from column 3-5 and 6-8) show that the smaller value of α, the more power consumption can be reduced. This is because our IMILP can capture the periodicity of specific tasks belonging to their applications, and deals with strictly and non-strictly periodic tasks correctly. To exploit energy-savings, the IMILP method effectively utilizes the flexibility of non-strictly periodic tasks in scheduling as the tasks do not need to start periodically.

Evaluation of the HLTSA-FT
We first compare HLTSA-FT heuristic method with MILP-based methods. The average power consumptions under different α are listed in the last three columns in Table 5. Compared with SMILP, the HLTSA-FT in case α = 3/4, 1/2 and 1/4 reduces power consumption for small benchmarks SG1-SG5 by 7.92%, 13.88% and 19.48%, respectively. The power consumption can be reduced on average, by 13.76%. For the five test cases, the average (minimum) deviation of the HLTSA-FT from the IMILP is only 3.2% (1.9%). The result demonstrates our HLTSA-FT heuristic can find near optimal solutions and its performance is close to that of IMILP for SG1-SG5. Although the MILP method can obtain optimal results, the computation time of the method grows exponentially with increasing size of benchmarks as shown in columns 2-4 in Table 6. The sign 'TL' in Tables 5 and 6 indicates that the MILP methods for LG1-LG4 cannot generate any optimal solution in limited time (10 h in our experiment). This verifies that the ILP solver fails to find the optimal solutions for models with large instances. However, our HLTSA-FT heuristic algorithm can always generate feasible solutions efficiently for the large benchmarks LG1-LG4. Thus, HLTSA-FT provides a good way for designers to search for energy-efficient scheduling when computation time is intolerable. We then compare HLTSA-FT with existing heuristic algorithms [38][39][40]. As mentioned in Section 2, the SA-based algorithms have been widely applied to achieve near-optimal solutions for low power scheduling. For fair comparison, we modify and implement the energy-efficient scheduling methods for our problem under three configurations: LSA, HLTSA, and HLTSA-FT. The LSA heuristic applies the list-based SA algorithm but does not consider tabu list and fine-tune phase. The HLTSA heuristic considers LSA integrating tabu list but no fine-tune phase. We evaluate and compare our HLTSA-FT algorithm with these heuristics in terms of two performance metrics: (1) the solution quality and (2) the computation time of searching process.
One can see that the HLTSA and HLTSA-FT outperform LSA in solution quality. The comparison of average power consumption of HLTSA-FT with those of SA-based algorithms are presented in Figures 7 and 8, respectively. In Figure 7, for small benchmarks SG1-SG5, the HLTSA and HLTSA-FT reduce average power consumption by 4.41% and 13.31%, respectively, compared with LSA. In Figure 8, for large benchmarks LG1-LG4, the HLTSA and HLTSA-FT reduce average power consumption by 6.09% and 19.27%, respectively, compared with LSA. This is due to the fact that HLTSA and HLTSA-FT use short-term memory of recently visited solutions known as tabu list in SA to escape from local optima. The search can be restricted from retiring to a previously visited solution and performance of SA can be enhanced significantly with help of the tabu list.  Moreover, our HLTSA-FT improves the solution quality in contrast to HLTSA. In Figure 7, for small benchmarks SG1-SG5, the HLTSA-FT in case α = 3/4, 1/2 and 1/4 reduces average power consumption by 8.79%, 8.91% and 10.11%, respectively, compared with HLTSA. In Figure 8, for large benchmarks LG1-LG4, the HLTSA-FT in case α = 3/4, 1/2 and 1/4 reduces average power consumption by 9.37%, 13.32% and 19.61%, respectively, compared with HLTSA. The reason lies in the fact that the performance is significantly improved by introducing the FT phase. The HLTSA without FT phase focuses on searching for better (concerning energy) solutions blindly and randomly, a lot of which are however abandoned because of violation of corresponding precedence and deadline constraints. The HLTSA-FT actively looks around for near solutions and leads the way to potential energy-efficient schedules by adjusting execution order of tasks if the precedence and deadline constraints are satisfied.
The column 5-13 in Table 6 presents average computation time under different SA-based methods for various α. The comparison results are obtained over 10 runs when solving our problem. As can be seen from the table, an interesting observation is that the computation time increases as α decreases. This is caused by the fact that, as α decreases, the number of constraints for specific strictly periodic tasks decreases and the search space of the problem becomes larger. This just demonstrates that our problem requires an effective heuristic algorithm to reduce complexity when the input size becomes larger.
To summarize, the experimental results presented above show that the proposed HLTSA-FT heuristic algorithm achieves a good trade-off between solution quality and solution generation time compared with the IMILP, LSA and HLTSA methods as the problem scale becomes larger. The algorithm is a scalable heuristic method that users can adjust the configuration parameters of the algorithm according to the specific input. To achieve further performance improvement, the HLTSA-FT algorithm can obtain high-quality solutions by increasing optimization iterations or executing multiple times within an acceptable time.

Conclusions
This paper has investigated the problem of scheduling a set of periodic applications for energy optimization on safety/time-critical time-triggered systems. In the applications, besides strictly periodic tasks, there also exist non-strictly periodic tasks in which different invocations of a task can start aperiodically. We present a practical task model to characterize the strictness of the task's periodicity, and formulate a novel scheduling problem for energy optimization based on the model. To address the problem, we first propose an improved MILP model to obtain energy-efficient scheduling. Although the MILP method can generate optimal solutions, its solution computation time grows exponentially with the number of inputs. Therefore, we further develop an HLTSA-FT algorithm to reduce complexity and efficiently obtain a high-quality solution within a reasonable time. Extensive evaluations on both synthetic and realistic benchmarks have demonstrated the effectiveness of our improved MILP method and the HLTSA-FT algorithm, compared with the existing studies.
Some issues are taken into account in our future work. In this paper, we assume that task mappings are given as a fixed input. For a higher energy efficiency, mapping and scheduling on time-triggered multiprocessor systems need to be integrated since they are inter-dependent. Currently, we are working on solving this problem. Furthermore, we intend to study how to integrate our approaches with online scheduling methods on the realistic safety/time-critical multiprocessor systems to leverage system-wide energy consumption.
Author Contributions: X.J. conceived and developed the ideas behind the research, performed the experiments and wrote the paper under the supervision of K.H. Authors K.H., X.Z., R.Y., K.W. and D.X. provided guidance and key suggestions. K.H. and X.Y. supervised the research and finalized the paper.