Energy Efficient Real-Time Scheduling Using DPM on Mobile Sensors with a Uniform Multi-Cores

In wireless sensor networks (WSNs), sensor nodes are deployed for collecting and analyzing data. These nodes use limited energy batteries for easy deployment and low cost. The use of limited energy batteries is closely related to the lifetime of the sensor nodes when using wireless sensor networks. Efficient-energy management is important to extending the lifetime of the sensor nodes. Most effort for improving power efficiency in tiny sensor nodes has focused mainly on reducing the power consumed during data transmission. However, recent emergence of sensor nodes equipped with multi-cores strongly requires attention to be given to the problem of reducing power consumption in multi-cores. In this paper, we propose an energy efficient scheduling method for sensor nodes supporting a uniform multi-cores. We extend the proposed T-Ler plane based scheduling for global optimal scheduling of a uniform multi-cores and multi-processors to enable power management using dynamic power management. In the proposed approach, processor selection for a scheduling and mapping method between the tasks and processors is proposed to efficiently utilize dynamic power management. Experiments show the effectiveness of the proposed approach compared to other existing methods.


Introduction
WSNs consist of a number of moblile sensor nodes which are tiny, multi-functional, and low-power. Table 1 lists mobile sensing platforms with various sensors. It is widely used in various applications to collect and process data, such as various types of physical and environment information. Recently, sensor nodes in WSNs have evolved for multimedia streaming and image processing. In response to these high performance demands, sensor nodes with multi-processors have emerged. A multi-processor sensor node platform, mPlatform, which is capable of parallel processing for computationally intensive signal processing, was proposed by Lymberopoulos et al. [1]. These platforms operate with limited batteries, as shown in Table 1. The use of a multi-cores in the sensor node makes energy consumption more serious. Power management among sensor nodes is of critical importance for several reasons: limited energy batteries and ensuring longevity [2][3][4], meeting performance requirements [2,5,6], inefficiency arising because of over provisioning resources [2], power challenges posed by CMOS scaling [2,7], and enabling green computing [2]. Recent advances in CMOS technology have improved the density and speeds for on-chip transistors. These trends limit the fraction of chips that can be used at maximum speeds within limited power. Therefore, power challenges in CMOS have been addressed for processor performance. Transistor performance scaling in the future may end if left unaddressed [8,9]. Battery-operated embedded systems are sensitive to high power consumption, which leads to heating and reduced battery lifetime. Thus,

•
At the beginning of each T-Ler plane, select the processors operating with a low frequency and minimize the processing capacity as much as possible.

•
Reduce the complexity of scheduling and fragments of idle time, and classify the processors and tasks into processor sets and task sets at the beginning of the T-Ler plane, respectively. • At each event in the T-Ler plane, utilize constrained migration to reduce the complexity of scheduling and fragments of idle time.
The first extension is to reduce the power loss caused by uniform multi-processors that consist of processors with difference processing capacities. The previous approach [20], as shown in Section 2, focuses solely on minimizing the number of processors. It is not suitable for uniform multi-processors. In the case of uniform multi-processors, the processors must be selected considering the processing capacity and the frequency of each processor. The second extension is to classify processors and tasks for limited scheduling, where tasks in a set are only scheduled to processors in the according processor set. The third extension is to adjust the sets in each event and to assign tasks to the processors using the limited scheduling. These prevent the unnecessary migration of tasks and enables the collection of idle time on particular processors.
We organize this paper as follows. In Section 2, we introduce related works, including the approaches previously based on T-L plane targeting uniform multi-processors. In Section 3, we propose mechanisms to select processors and allocate tasks for energy-efficient scheduling in uniform multi-processors. We extend the proposed events in identical multi-processors to ones in uniform multi-processors. In Section 4, we perform experimental evaluations by comparing our proposed algorithms with other algorithms. Lastly, we present the conclusions and future works in Section 5.

Power Management Techniques
Due to the advancements in semiconductor process technologies, there have been more high-end processors available that integrate more transistors. Recently, real-time embedded systems have been increasingly adopting high-end processors. In addition, to improve the performance, real-time embedded systems are also adopting multi-processors. However, this increases the processor power consumption significantly. The power consumption of CMOS chips is as follows [21]: (1) P static is the static power consumption which is calculated as the sum of the leakage power and short current power. P dynamic is the dynamic power consumption by charging and discharging of the output capacitance for processing time. It is not easy to reduce the static power consumption which depends on various parameters in the semiconductor process. Therefore, we focus on reducing the dynamic power consumption. Dynamic power is defined as: where f is the frequency, α is the switching activity factor, V is the supply voltage, and C is the capacitive load. DVFS is a method used to adjust the supply voltage and frequency of a CMOS chip by utilizing the slack time that occurs when scheduling tasks. On the other hand, DPM is a method of reducing energy consumption by switching to a low power state when slack time occurs. However, if a sufficient slack time is not guaranteed over the break-even time, the energy overhead caused by the state transition will cause loss. The break-even time BET sleep is determined by Equation (3) [22].
The transition energy overhead and recovery time are denoted as E sw and t sw , respectively. P idle denotes the idle power. The sleep power is denoted by P sleep . The break-even time should be considered when developing a scheduling algorithm that not only uses the sleep mode, but also guarantees real-time responsiveness.

Global Scheduling Approaches on Multi-Processors
Scheduling disciplines can be categorized by considering the complexity of the priority mechanisms and the degree of job migration. Considering how task priorities are determined, Carpenter et al. [23] have categorized the schemes to static, dynamic but fixed within a job, or fully dynamic.

•
Static: A single fixed priority is applied to all jobs for each task in the system. e.g., Rate Monotonic (RM) scheduling.

•
Dynamic but fixed within a job: Different priorities may be assigned for the jobs of a task, but a job has a fixed priority at different times. e.g., Earliest Deadline First (EDF) scheduling.

•
Fully dynamic: Different priorities may be assigned for a single job at different times, e.g., Least Laxity First (LLF) scheduling.
Depending on the degree of job migration, Carpenter et al. [23] have categorized the migration criterion to no migration, restricted migration, and unrestricted migration.

•
No migration: The set of tasks in the system is partitioned into some subsets for available processors, a scheduler schedules a subset on a unique processor. The jobs of a task in a subset are executed on the corresponding processor.

•
Restricted migration: Each job of a task must be scheduled entirely on a single processor. However, other jobs of the same task may be executed on different processors. Therefore, migrations among processors are allowed at the task-level context, but not at job boundaries.

•
Unrestricted migration: Any jobs is also allowed to migrate among processors during its lifetime.
Note that our proposed scheduling algorithm supports fully dynamic and unrestricted migration. Various global scheduling algorithms for multi-processors have been studied. In global scheduling, all eligible jobs waiting for execution are in a single priority-ordered queue shared by all of the processors in the system; the highest priority job is dispatched from this queue by the global scheduler. Most of early the studies on global scheduling extended optimal scheduling algorithms known well for a single processor, such as RM and EDF, to multi-processors. However, these extensions can result in wasted utilization of resources. The fluid scheduling model with fairness notion, where each task is always executed at a fixed rate, emerged to overcome the limitation [24]. Figure 1 compares the fluid scheduling concept and the practical scheduling. There is a gap between fluid scheduling and practical scheduling, as shown in Figure 1. There are some algorithms extending the fluid scheduling model for achieving optimality on multi-processors. Proportionate fair (P-fair) scheduling has produced a feasible schedule for periodic tasks on multi-processors, and it has shown considerable promise in multi-processor scheduling [25]. However, extensive amount of migrations and preemption are needed to follow the fluid schedule. Much effort has been made to overcome this problem in global optimal scheduling. Thereafter, Deadline Partitioning-fair (DP-fair) and Deadline Partitioning-warp (DP-wrap) algorithms were proposed, and they exhibited better performance with respect to preemption in [26]. The method of allocating tasks to the processors supported by these scheduling algorithms is not suitable for uniform multi-processors. Cho et al. [27] proposed Largest Nodal Remaining Execution-time First (LNREF) using a T-L plane abstraction and it performs well with uniform multi-processors. Funk and Meka [28] proposed a T-L plane based scheduling algorithm, U-LLREF, that extends LNREF algorithm for uniform parallel machines. In U-LLREF, a uniform multi-processors provides a condition for determining event-c. Chen et al. [29] proposed Precaution Cut Greedy (PCG), a T-L plane based scheduling algorithm for uniform multi-processors. PCG uses a modified T-L plane, a T-Ler plane. Figure 2 shows how the PCG schedules in the first T-Ler plane. When event-c occurs, τ 3 is assigned to p 2 until the end of the T-Ler plane. Thus, in PCG, a task monopolizes a single processor, thereby preventing unnecessary task migration.

T-L Plane Based Energy-Efficient Global Optimal Scheduling Approaches
Energy-efficient scheduling based on the T-L plane for uniform parallel machines has been proposed due to the demand for energy efficiency. Uniform RT-SVFS [30] reduces the energy consumption by scaling the frequency of all processors with a constant rate. By scaling the height of the T-L plane, as shown in Figure 3, scheduling is enabled at the changed frequency. α k represents the normalized frequency of the processor. In addition, energy-efficient T-L plane based scheduling algorithms for unrelated parallel machines have emerged. Independent RT-SVFS [30] determines the frequency by statically scaling each processor. This algorithm has been proposed to overcome the heavy task bottlenecks that can occur when using the frequency scaling technique. The Growing Minimum Frequency (GMF) [31], which is a state-of-the-art algorithm for T-L plane based non-uniform frequency scaling for saving energy on VFS embedded multi-processors, has been proposed for the frequency control of multi-processors using U-LLREF, and the global optimal frequency can be determined. RT-DVFS [32] allows you to dynamically adjust the frequency of each processor in the event of scheduling. It is difficult to consider DPM due to the idle time fragmentation problem that occurs when using the T-L plane based scheduling algorithm. In addition, since scheduling is performed using all processors existing in the system, a considerable energy overhead due to unnecessary state transition occurs when DPM is used. TL-DPM [19] solves the idle time fragmentation problem by using a new event to retrieve tokens, which is performed in the next plane. However, since only the token of the next plane is targeted, there is room for solving the idle time fragmentation problem. Kim et al. [20] proposed a generalized method for executing tokens to be scheduled in the later plane in the current plane in order to solve this problem. To reduce the number of state transitions, scheduling is performed using only the minimum number of processors.

Feasibility Conditions
Theorems 1 and 2 represent the conditions that must be met to obtain schedules satisfying the time constraints when uniform multi-processors are used for scheduling the given task set. Theorem 1. (Horvath et al. [33]) The level algorithm constructs a minimal length schedule for the set of independent tasks with service requirements e 1 ≥ e 2 ≥ ... ≥ e n on the processing system The schedule length is given by Theorem 2. (Funk et al. [34]) Consider a set τ = {τ 1 , ..., τ n } of periodic tasks indexed according to non-increasing utilization (i.e., u i ≥ u i+1 for all i, Let π denote a system of m ≤ n uniform processors with capacities c 1 , c 2 , ..., c m , c i ≥ c i+1 for all i, 1 ≤ i ≤ m. Periodic task system τ can be scheduled to meet all deadlines on the uniform multi-processor platform π if and only if the following constraints hold: where U k ≥ c k , for all k = 1, 2, ..., m.
Selecting processors for the scheduling tasks at the beginning of each T-L plane is divided into the case where tasks are allocated to the processors with the same capacity as the utilization, and the case where they are not. Table 2 shows some examples of processor selections for scheduling the task set shown in Table 3. In a CMOS chip, power consumption is determined by the operating frequency and supply voltage. The relationship between the power consumption and the supply voltage in the processor is as follows.

Simple Case: Exact Match
In addition, according to the relationship between the supply voltage and the operating frequency in a processor, shown in Equation (7), a processor operating at a higher frequency requires a higher supply voltage than that operating at a lower frequency. Therefore, as shown in Table 2, a processor with a higher supply voltage will have a higher capacity.
where V th is the threshold voltage of transistors and β is a measure of the velocity saturation in COMS transistors. Table 2. An example of the available processor sets.   Table 2 satisfy Theorems 1 and 2 presented above. Since the processing capacity of S 1 , S 2 , and S 3 is equal to the total utilization of the task set, there is no idle time when the task set is scheduled. However, since the number and capacity of processors is not the same in each processor, the power consumed by S 1 , S 2 , and S 3 is different. The energy consumption for scheduling the task set in Table 3 on S 1 , S 2 , and S 3 is shown in Table 4. The lowest power consumption can be observed on S 2 , where each task is independently assigned to a processor whose capacity is equal to the utilization of each task in the task set. If the total capacity of a processor set is equal to the total utilization of a task set, then there is no idle time because all processors always perform their tasks. Therefore, the power consumption of each processor is dependent on the processed workload. High-capacity processors can handle more work in terms of processor workloads. Equation (8) shows the power consumption E e (V k ) needed to process e i in a processor whose operating frequency and supply voltage are f k and V k , respectively.
where V k and f k denote the voltage and capacity of the k-th processor, respectively. Lemma 1 shows the power consumption characteristics of processor sets whose total capacity is equal to the total utilization of a task set. Table 4. Dynamic power consumption of some feasible processor sets.
Dynamic power consumption 2.46αC 1.94αC 2.68αC Lemma 1. If U total = c n = c i + c j , when scheduling a task set with U total on two processor sets S 1 = {c n } and S 2 = {c i , c j }, the power consumption satisfies αCV 2 n e n > αCV 2 i e i + αCV 2 j e j .
Proof of Lemma 1. According to Equation (8), the power consumption measures of S 1 and S 2 are αCV 2 n e n and αCV 2 n e i + αCV 2 i e j respectively. Since c n = U total and c i + c j = U total , there is no idle time when the tasks are scheduled. In addition, According to Lemma 1, selecting the 0.8 capacity processor for scheduling the 0.6 and 0.2 utilization tasks in the task set, as shown in Table 1, will result in higher power consumption than selecting the 0.6 and 0.2 capacity processors for the scheduling the tasks. Therefore, assigning each task to the processor sets whose capacity is equal to its utilization is the most energy-efficient way when there are enough processors. Under the condition of c i ≤ u i , the processor whose capacity is equal to u i shows the lowest power consumption to execute the task with the utilization u i . Lemma 2 shows these characteristics. Lemma 2. When a task with utilization u i is executed on two processors under the condition of c n > c j = u i , their power consumption for processing the allocated workload during the task period is E e (V n ) > E e (V j ).
Proof of Lemma 2. When two processors with capacities of c n and c j perform the workload e i during the period p i , their power consumption measures are E e (V n ) and E e (V j ) respectively. If c n > c j , V n > V j is satisfied by Equation (7). Hence, E e (V n ) > E e (V j ) is satisfied by Equation (8).
When a task set with the total time of ∑ n i=1 e i is scheduled on n processors whose capacity is different, the power consumption required for processing the allocated workload on n processors is shown in Equation (9). e 1 , e 2 , ..., e n represents the workload assigned to each processor.
If the total capacity of n processors is greater than the total utilization of a task set to be scheduled, there will be idle time during task scheduling. This means that the power consumption during the idle time should be taken into account to measure the processors' power consumption required for scheduling the task set. The power consumption of n processors is shown in Equation (10). α i denotes the power consumption of the i-th processor during the idle time.
Lemma 3 and Theorem 3 show the power consumption required for scheduling a task set on a set of n processors with different capacities. Lemma 3. When the task set is scheduled with the processor set S 1 , the lowest power is consumed, where the total capacity of S 1 is ∑ ∀τ i ∈τ u i = ∑ ∀p i ∈S 1 c i .
Proof of Lemma 3. If ∑ ∀τ i ∈τ u i = ∑ ∀p i ∈S 1 c i , scheduling involves no idle time, so the processor power consumption is ∑ ∀p i ∈S 1 E e (V i ). If ∑ ∀τ i ∈τ u i < ∑ ∀p i ∈S 1 c i , scheduling involves some idle time, so the processor power consumption based on Equation (10) is then the lowest power consumption will be observed. Theorem 3. Independently assigning each task in the task set τ to processors whose capacity is equal to the utilization of the task u i shows the lowest power consumption for scheduling a set of tasks.
Proof of Theorem 3. This is easily proven by Lemmas 1-3.
Therefore, selecting processors whose capacity is equal to the utilization of each task shows the lowest power consumption for scheduling a set of tasks.

Generalized Solution
When not assignable to a processor with the same capacity as the task's utilization, it is necessary to select a processor set available for scheduling with the limited processors. Table 5 shows the characteristics of processors used for task scheduling. Table 6 shows the processor sets selected from the processors in Table 5 for scheduling the task set shown in Table 7.  Table 6. Selecting processors for scheduling a task set. Since the processor sets S 1 , S 2 , S 3 , and S 6 shown in Table 6 satisfy Theorems 1 and 2, they can be used for task scheduling. However, since the processor sets are differently configured, the idle time during the task scheduling and the difference in their supply voltages result in their different power consumption. Therefore, the following two strategies should be considered to select energy-efficient processors. Selecting a processor set for task scheduling in consideration of all the problems presented above is a NP-hard problem. Therefore, in this paper, we propose a heuristic method for selecting an energy-efficient processor set. In the proposed method, if the size of the current plane is smaller than C sleep , the processor in active mode is added to a processor set for task scheduling because it cannot be switched to sleep mode at the end of the previous plane. If the preferentially selected processors are not enough for scheduling the given task set, additional processors will be selected. Processors for scheduling are selected in terms of the local utilization of tasks from highest to lowest. Selecting processors for scheduling depends on the difference between the total local utilization of tasks in τ ready at the start time t 0 in each plane ∑ τ j ∈τ ready r j (t 0 ) and the total capacity ∑ p j ∈P selected c j of the processors in P selected . The selected processors are moved to P selected . The following describes how to select processors.

•
If 0 ≤ ∑ τ j ∈τ ready r j (t j ) − ∑ p j ∈P selected c j < r i (t 0 ), the processor with the smallest capacity is selected for scheduling in the given processor set, , the previously selected processor is used for scheduling without selecting an additional processor. P all is the set of all the processors in the system. P selected is the set of the selected processors for task scheduling. Algorithm 1 shows how to select processors for scheduling at the beginning of each plane. The function getMinimumCapacityProcessor(availableCapacity, τ, P temp ) takes the available capacity (availableCapacity) of the previously selected processor into account to return the lowest capacity processor for scheduling the task set τ from the given processor set P temp . The function add() adds elements to the set, and the function erase() removes elements from the set. The processors in P sleep indicate the processor in the sleep state in the plane. It is necessary to ensure a break-even time longer than the idle time in order to use DPM techniques for switching the state of a processor. To ensure the idle timeis long enough to enter the sleep mode, the idle time in the plane should be generated as much as possible on a single processor. To prevent unnecessary power consumption, a task is assigned to the selected processor whose capacity is the lowest for scheduling the task. For this reason, in the proposed method, the processors in the selected processor set are classified into the following categories: processors that can be used to the maximum extent in the plane and processors that can be used exclusively by a single task in the plane. That is, the processors in P selected are classified into the following categories: P f ixed , P max , and P slack . The processors in P f ixed represent a set of processors exclusively used by a single task. P max is the set of processors used to the maximum extent in the plane. P slack is the set of processors that may result in idle time in the plane during task scheduling. The tasks to be executed on the classified processor sets are divided into the following categories: τ f ixed , τ max , and τ slack . Tasks assigned to a processor set cannot be moved to another processor set. The following describes how to classify the processor sets.
Algorithm 1 Processor selection of the beginning time of a T-L plane 1: Input : P all , P sleep , τ all , psize 2: Output : P selected , τ ready 3: psize-Size of the T-L plane 4: P all -The set of processors in the system 5: P sleep -The set of processors to be sleep mode 6: P selected -The set of processors selected for scheduling tasks 7: P temp -The temporary set of processors 8: τ all -The set of all tasks in the system 9: τ ready -The set of ready tasks 10: τ-Temporary variable for tasks 11: p-Temporary variable for processors 12: availableCpacity-Temporary variable for available capacity 13: for ∀p ∈ P all − P sleep do 14: if psize < p.C sleep then 15: add(p, P temp ); 16: end if 17: end for 18: for ∀τ ∈ τ all do 19: if τ.e > 0 then 20: add(τ, τ ready ); 21: end if 22: end for 23: repeat 24: τ = getFirstLocalUtilizationTask(τ ready ); 25: if availableCapacity ≥ τ.r(t 0 ) then 27: contitue; 28: else 29: p = getMinimumCapacityProcessor(availableCapacity, τ, P temp ); 30: if p is null then 31: p = getMinimumCapacityProcessor(availableCapacity, τ, P sleep ); 32: if p.c > p 1 .c then 33: erase(p, P sleep ) 34 end if 42: until τ is not null 43: return P selected , τ ready 1. To select a processor for scheduling a task τ i in τ ready where the difference between the total local utilization of the tasks in τ slack at t 0 (∑ τ j ∈τ slack (t 0 )) and the total capacity of the processors in P slack (∑ τ j ∈τ slack r j ) is greater than zero: ∑ τ j ∈τ slack r j (t 0 ) − ∑ p j ∈P slack c j > 0.
• If ∑ τ j ∈τ slack r j (t 0 ) − ∑ p j ∈P slack c j ≥ r i (t 0 ), the task is additionally assigned to a previously selected processor without selecting an additional processor. The assigned task is moved from τ slack to τ ready . • If ∑ τ j ∈τ slack r j (t 0 ) − ∑ p j ∈P slack c j < r i (t 0 ), the task is additionally assigned to a previously selected processor without selecting an additional processor. The assigned task is moved from τ slack to τ ready .
2. If ∑ τ j ∈τ slack r j (t 0 ) − ∑ p j ∈P sl c j = r i (t 0 ), • All processors and tasks in P slack and τ slack are moved to P max and τ max .
• The processor whose capacity is the lowest for scheduling a task τ i , is selected from the following processor set, {p j |c j ≥ r i (t 0 )wherep j ∈ P selected }. If the capacity of the selected processor is equal to the local utilization (r i (t 0 )) of the task t i , the processor and the task are moved to P f ixed and τ f ixed , respectively. Otherwise, they are moved to P slack and τ slack , respectively.
Algorithm 2 shows how to classify the processor set P selected into the following categories: P f ixed , P max , and P slack . The task with the highest local utilization is considered first to classify the processor set and the task set. The function getFirstLocalUtilizationTask(τ ready ) returns the task with the highest local utilization in τ ready . The function getMinimumCapacityProcessor(availableCapacity, τ, P selected ) takes availableCapacity into account to return the processor whose capacity is the lowest for scheduling a task τ in P selected . If the capacity of the returned processor is equal to the local utilization of the task, the processor and the task is moved to P slack and τ slack , respectively. If availableCapacity is 0, the processors in P slack and the tasks in τ slack are moved to P max and τ max .

Scheduling Strategy
In the paper written by Chen et al. [29] , there are two suggested methods for scheduling on a uniform multi-processors. However, event-t, event-s, and event-r presented above are not taken into account in these scheduling methods. In this section, we propose a new T-L plane based scheduling method in which event-t, event-s, and event-r are used to reduce the power consumption of a uniform multi-processors. When the τ f ixed , τ max , and τ slack tasks are scheduled with the P f ixed , P max , and P slack processor sets, the tasks cannot be moved from one processor set to another in order to generate no idle time on the processors in P f ixed and P max . The remaining part shows the movement of elements between task sets and processor sets and the processor assignment when a rescheduling event occurs. Since event-t as defined above targets identical multi-processors is not suitable for uniform multi-processors, it is redefined as in Definition 1.

Definition 1.
An event-t in uniform multi-processors occurs at t t if the following conditions are met.
Algorithm 3 shows the process of assigning tasks to processors when a rescheduling event occurs. All the tasks in the set τ active are moved to the set τ ready , and all the tasks in the set τ active are deleted. The function eraseAll(τ active ) removes all elements in the set τ active . Tasks are assigned to processors in each processor set in the following order: P slack , P max , and P f ixed . The function getMaximumLocalUtilizationTask(p.c, τ f ixed , τ ready ) returns the task with the highest local utilization in τ f ixed and τ ready where the task can be performed on the processor with the capacity of p.c. The function getFirstLocalUtilizationTask(τ f ixed ,τ ready ) returns the task with the highest local utilization in τ f ixed and τ ready . The function allocateTaskToProcessor(τ, p) assigns the task τ to the processor p.
Algorithm 2 Classification of selected processors for scheduling 1: Input : P selected , τ ready 2: Output : P f ixed , P max , P slack , τ f ixed , τ max , τ slack 3: P f ixed -The set of processors fixed by a task 4: P max -The set of processors having maximum utilization 5: P slack -The set of processors to be able to have slack time 6: τ f ixed -The set of tasks fixed to a processor on on P f ixed 7: τ max -The set of tasks scheduled on P max 8: τ slack -The set of tasks scheduled on P slack 9: τ 1 -Temporary variable for tasks 10: τ 2 -Temporary variable for tasks 11: p 1 -Temporary variable for processors 12: p 2 -Temporary variable for processors 13: repeat 14: τ 1 = getFirstLocalUtilizationTask(τ ready ); 15: availableCapacity = ∑ p i ∈P slack p i .c − ∑ τ i ∈τ slack τ i .r(t 0 ); 16: p 1 = getMinimumCapacityProcessor(availableCapacity, τ1, P selected ); 17: if p 1 .c = τ 1 .r(t 0 ) then 18: add(p 1 , P f ixed ); 19: add(τ 1 , τ f ixed ); 20: else if availableCapacity = 0 then 21: for ∀τ 2 ∈ τ slack do 22: add(τ 2 , τ max ); 23: erase(τ 2 , τ slack ); 24: end for 25: for ∀p 2 ∈ P slack do 26: add(p 2 , P max ); 27: erase(p 2 , P slack ); 28: end for 29: add(τ 1 , τ slack ); 30: add(p 1 , P slack ); 31: else 32: add(τ 1 , τ slack ); 33: add(p 1 , P slack ); 34: end if 35: erase(p 1 , P selected ); 36: until τ 1 is not null 37: return P f ixed , P max , P slack , τ f ixed , τ max , τ slack Algorithm 4 shows the movement of the elements between processor sets and task sets. When an event-b occurs, all the tasks which have triggered an event-b are moved to τ done and are removed from τ active . The function getEventTasks() returns all the tasks that have triggered the event-b. When an event-c or an event-f occurs, all the tasks that have triggered the event are moved to τ f ixed , and the processors that have triggered the event are moved to P f ixed . The function getProcessor(τ.r(t 0 ), P max ) returns the processor with the capacity τ.r(t 0 ) in P max . When an event-t occurs, the processors which can be switched to sleep mode are moved to P sleep and are removed from P slack . When an event-s or an event-r occurs, all the tasks that have triggered the event are moved to τ done and are removed from τ active . The function reallocateProcessorTime() assigns the available processing time to a task with remaining execution time in τ done . The assigned task is moved to τ ready . Algorithm 3 Assignment of tasks to processors at rescheduling 1: Input : P f ixed , P max , P slack , τ f ixed , τ max , τ slack 2: Output : τ f ixed , τ max , τ slack 3: for ∀τ ∈ τ active do 4: add(τ, τ ready ); 5: end for 6: eraseAll(τ active ); 7: for ∀p ∈ P slack do 8: τ = getMaximumLocalUtilizationTask(p.c, τ slack , τ ready ); 9: if τ is null then 10: τ = getFirstLocalUtilizationTask(τ slack , τ ready ); 11: end if 12: allocateTaskToProcessor(τ, p); 13: erase(τ, τ ready ); 14: add(τ, τ active ); 15: end for 16: for ∀p ∈ P max do 17: τ = getMaximumLocalUtilizationTask(p.c, τ max , τ ready ); 18: if τ is null then 19: τ = getFirstLocalUtilizationTask(τ max , τ ready ); 20: end if 21: allocateTaskToProcessor(τ, p); 22: erase(τ, τ ready ); 23: add(τ, τ active ); 24: end for 25: for ∀p ∈ P f ixed do 26: τ = getMaximumLocalUtilizationTask(p.c, τ f ixed , τ ready ); 27: allocateTaskToProcessor(τ, p); 28: erase(τ, τ ready ); 29: add(τ, τ active ); 30: end for 31: return τ f ixed , τ max , τ slack Figure 4 shows the scheduling in the first plane from the proposed method when scheduling the tasks of Table 8 on the processors listed in Table 9. Algorithm 2 is used to categorize the processor sets and ready tasks selected by Algorithm 1 at t 0 . Task τ 5 that has triggered an event-c at τ 1 and the processor p 3 whose capacity is equal to the local utilization of τ 5 are moved to τ f ixed and P f ixed , respectively. At the same time, the processor p 4 is moved to P sleep by event-t. Task τ 1 that has triggered an event-b at t 2 is moved to τ done . Task τ 3 that has triggered an event-b at t 3 is moved to τ done . At the same time, task τ 3 that has triggered an event-c and the processor p 1 whose capacity is equal to the local utilization of τ 3 are moved to τ f ixed and P f ixed , respectively. Table 10 shows the elements added to the processor and task sets by Algorithm 4 at each event in the 1st plane. Tasks are assigned to processors by Algorithm 3. As shown in Figure 4, tasks assigned to processors move diagonally along the slope of the processor capacity and tasks unassigned to processors will move horizontally.

Energy Efficiency on Uniform Multi-Processors
In this section, the performance of the proposed algorithm is compared with the major algorithms previously developed for power management. We implemented a simulator operating in Windows 10 using the Ruby language (version 2.4.1) for the experiments. Figure 5 illustrates the architecture of the simulator. The results of the simulation show the energy consumption for task executions, as well as the energy overheads associated with the state transitions.

Experiment Environment
The characteristics of the cortex-A7 core in Marvell's MV78230, which is the Multi-Core ARMv7 system based on the chip processor, is used to set the experimental parameters of the processor in the simulator. This core supports dynamic frequency scaling and dynamic power down options. Tables 11  and 12 show that cortex-A7 supports six frequency levels and five processor states. Run thermal is used in the stress test of the CPU. The deep idle and sleep modes consume the same energy with respect to the CPU. We consider the run typical, idle, and sleep modes in Table 12 for our experiment. WolfBot [16], which is a distributed mobile sensing platform, has ARMv7 based cortex processors. To confirm the scalability of the proposed algorithm, we change the number of available processors within the range 8-32. Then, we use the Emberson procedure to construct 100 task sets on each available processor. The total utilization of the task set is equal to 8, and the task has a utilization within 0.01-0.99. The period of each task is evenly distributed within 10-150 and simulated for 1000 system units. Table 13 shows the platform type and power management technique of the algorithm to be simulated. The algorithm's platform type is called "non-uniform" when the associated frequency of each processor is independently adjustable, and is called "uniform" when it can change all the frequencies at a constant rate when scaling the frequency of the processor. It is possible for each processor among the uniform multi-processors to operate at a different frequency. A job has a different execution time depending on which processor is allocated. These platforms are otherwise called "unrelated".  Figure 6 shows the power efficiency obtained by simulating the five algorithms mentioned in Table 13 while varying the number of available processors and the number of tasks. We implement our proposed algorithm as well as the following models: PCG, the original uniform algorithm without any power management [29]; Uniform-DPM, our proposed scheduling algorithm for DPM-embedded uniform multi-processors; GMF [31]; Independent RT-SVFS [30]; and Uniform RT-SVFS [30]. The x-axis of Figure 6 represents the number of available processors, and the y-axis represents the normalized power consumption (NPC). The power consumption consumed by the PCG is measured by the reference consumption and the power consumption rate of each algorithm. Figure 6 show the results when the number of tasks composing a task set is 12, 16, 20, and 24, respectively. All of the algorithms to be simulated is global optimal scheduling. Thus, since the total utilization of the task set used in the simulation is fixed at 8, the power efficiency of all algorithms shows 100% energy consumption in all scheduling using eight processors. As shown in Figure 6, the GMF and RT-SVFS algorithms change the power efficiency according to the number of tasks, while the proposed algorithm, Uniform-DPM, consumes the same a mount of power. This is because they always generate the same idle time. In addition, in the case of many available processors, the proposed algorithm shows high power efficiency by preventing unnecessary processor activation and idle time fragmentation, and by preventing frequent state transitions of the processor. GMF and independent RT-SVFS have similar power efficiencies because they determine the frequency of each processor independently. GMF finds a global optimal solution in the search spaces, but not Independent-SVFS. Thus, GMF is better than Independent-SVFS, as shown in Figure 6. Uniform RT-SVFS adjusts the frequency of all processors to a certain ratio, so if the number of tasks is small, the energy efficiency is not good because the work can be concentrated on some processor and the frequency of the processor cannot be lowered. However, as the number of tasks increases, the number of tasks can be divided and processed simultaneously by multiple processors, which can reduce the frequency of the processor. Tables 14 and 15 show the energy efficiency characteristics of this proposed algorithm. Table 14 shows that Uniform-DPM always shows constant energy efficiency regardless of the number of tasks. Table 15 shows that the energy efficiency increases as the number of processors increases.

Conclusions and Future Works
The lifetime of WSNs is closely related to the management of sensor nodes operating at limited energy. In this paper, we propose a power management method for sensor nodes supporting DPM-enabled uniform multi-processors. In the proposed approach, the selection of processors to process a set of tasks and the assignment of tasks to the selected processors have been proposed in terms of energy efficiency. In addition, we implement a simulator to measure the power consumption of various scheduling algorithms. The experimental results show that the proposed algorithms provide better scalability to the number of available processors than DVFS-based approaches. Currently, our proposed algorithms can handle periodic tasks with implicit deadlines. In future work, we plan to extend our algorithms to handle sporadic tasks with time constraint. We are very interested in combining the DVFS and DPM approaches for T-L plane abstraction as well. In addition, studies on trade-offs between the power usage and computational complexity, as well as performance evaluations on overloaded situations, would be interesting problems for potential future research.