Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore Processors

: The use of real-time systems is growing at an increasing rate. This raises the power efﬁciency as the main challenge for system designers. Power asymmetric multicore processors provide a power-efﬁcient platform for building complex real-time systems. The utilization of this efﬁcient platform can be further enhanced by adopting proﬁcient scheduling policies. Unfortunately, the research on real-time scheduling of power asymmetric multicore processors is in its infancy. In this research, we have addressed this problem and added new results. We have proposed a dynamic-priority semi-partitioned algorithm named: Earliest-Deadline First with C=D Task Splitting (EDFwC=D-TS) for scheduling real-time applications on power asymmetric multicore processors. EDFwC=D-TS outclasses its counterparts in terms of system utilization. The simulation results show that EDFwC=D-TS schedules up to 67% more tasks with heavy workloads. Furthermore, it improves the processor utilization up to 11% and on average uses 14% less cores to schedule the given workload.


Introduction
The use of real-time systems has grown rapidly due to their assorted application areas ranging from simple household electronics to fully automated industrial control systems [1,2].These systems are characterized by temporal constraints, and the fulfillment of these constraints is considered as necessary as executing the tasks correctly.The temporal correctness can be effectively achieved by designing efficient task scheduling policies [1,3].A real-time scheduler decides the execution order of time dependent tasks.Its essential goal is to schedule tasks so that they can meet their timing constraints.Proficient scheduling approaches not only improve the system utilization but can also be integrated with other power management techniques to attain high power efficiency.Dynamic voltage and frequency scaling (DVFS) [4] and memory shut-down [5] are instances of such improvements.On DVFS enabled processors, supplied voltage and clock frequency are dynamically adjusted depending upon the current workload.In this way, the system consumes less power when the workload is on lower side.Similarly, in memory shut-down technique unused memory is dynamically shut-down.This also results in reduced energy consumption.Both DVFS and memory shut-down can be effectively integrated with real-time scheduling to achieve energy efficiency.
In recent times, real-time applications have grown extensively in complexity.Multicore processors are considered more favorable for implementing such complex and processing intensive applications due to their proficiency in terms of energy consumption and heat Symmetry 2021, 13 generation [6].Multicore processors are fundamentally categorized as homogeneous, heterogeneous, or power asymmetric (also known as uniform or single-ISA heterogeneous) [6].
In homogeneous multicore processors, all of the cores have similar functional and processing capabilities while a heterogeneous multicore processor may contain cores with different functional and processing capabilities [7].On the other hand, processing cores in a power asymmetric multicore processor are similar in functional capabilities but they may differ in their processing capabilities [7][8][9].Power asymmetric processors are viewed as better in terms of energy consumption when contrasted with its other counterparts [8,9].
Research on single-processor real-time scheduling is considered developed yet there is as yet a generous space for research on multiprocessor/multicore scheduling [10,11].The existing multicore real-time scheduling approaches are categorized as partitioned, global, or semi-partitioned [11,12].In partitioned scheduling, the given workload is divided into m subsets (where m is the number of cores) such that each subset k is feasible on corresponding core k.During execution each subset is executed on its assigned core.This task to core binding is permanent and no task can migrate during the execution [11,12].On the other hand, in global scheduling, all of the tasks are placed in a single prioritized queue and the scheduler assigns them to cores according to their priorities.Therefore, during execution tasks can migrate from one core to another [11,12].Generally, global scheduling is considered superior to partitioned scheduling in terms of system schedulability but suffers from high runtime overheads.Semi-partitioned scheduling is presented as a compromise between pure partitioned and global scheduling in order to reduce the runtime overheads associated with the global scheduling and to improve the performance of partitioned scheduling.The semi-partitioned scheduling extends the partitioned scheduling by allowing a small number of tasks to migrate, which results in improved system utilization [11,12].
Although a lot of work has been done in multicore scheduling but still none of the existing approaches achieve optimal performance.The best-known utilization bound for both global and partitioned scheduling algorithms is 50%, while the semi-partitioned scheduling improves it up to 65% [11,13].Furthermore, most of the results are based on the homogenous multicore processors whereas research on power asymmetric multicore scheduling is still in its infancy.The power consumption has become a main challenge for future embedded system designs; therefore, it is much needed to consider the power-efficient power asymmetric multicore processors while addressing the real-time scheduling problem [11].In this paper, we have considered the dynamic priority real-time scheduling of power asymmetric multicore processors and proposed a semi-partitioned scheduling algorithm named Earliest-deadline First with C=D Task Splitting (EDFwC=D-TS).EDFwC=D-TS algorithm allocates the tasks to cores in decreasing order of their utilizations while the cores are sorted in descending order of their processing power.It utilizes the C=D heuristic to split tasks.The simulation results show that EDFwC=D-TS outclasses its counterparts and provides better system utilization.
The rest of the paper is organized as follows.Section 2 presents the existing work that is closely related to the addressed problem.System and task models are given in Section 3. In Section 4 the EDFwC=D-TS algorithm is presented.Experimental evaluation of the proposed work is presented in Section 5. Section 6 presents the evaluation of the EDFwC=D-TS algorithm while our work is concluded in Section 7.

Related Work
In this section, we present the existing work that is generally pertinent to the addressed problem.Since semi-partitioned scheduling utilizes uniprocessor schedulability analysis while doling out tasks to cores, we therefore first discuss some significant results on uniprocessor dynamic-priority real-time scheduling.Subsequently, we discuss the most significant existing literature on dynamic-priority semi-partitioned real-time scheduling and power asymmetric multi-processor scheduling.

Uniprocessor Scheduling
In 1973, Liu and Layland did the pioneer and the most influential work in real-time scheduling theory (presented in [14]).Under the dynamic-priority category, they proposed an optimal algorithm named Earliest-deadline First (EDF) [14].EDF assigns the highest priority to the task that has the least absolute deadline.Liu and Layland have proved that EDF achieves 100% system utilization, i.e., any task-set Γ can be feasibly scheduled on a single processor system using the EDF algorithm if U(Γ) ≤ 1. Baruah et al. [15] derived an exact schedulability test known as Processor Demand Analysis (PDA) for sporadic arbitrary relative deadline tasks.According to the PDA a task-set Γ is EDF schedulable if: where h(t) is the function that computes the maximum CPU time required by all tasks which have both arrival times and relative deadlines in the interval of length t where h(t) is given by Equation (2): where P j is the period, D j is the relative deadline, and C j is the worst-case execution time of the task j.Since the value of t may be very large, it may take a lot of time to determine the feasibility of a task-set using PDA.In [15], Baruah et al. determined an upper bound L a on the value of t given as follows: In Equation (3), U represents the system utilization factor of the given workload.Therefore, the feasibility condition for task-set under PDA is given by Equation (4): Ripoll et al. [16] further reduced the upper bound on the maximum value of t that is given by Equation ( 5 Ripoll et al. [16] and Spuri [17] derived a recursive function to determine the upper bound (L b ) on the value of t given by Equation (6): The initial value of S q is set equal to The recurrence S q+1 is solved until it gives the same value in two consecutive iterations.Zhang and Burns [18] presented the Quick-convergence Processor-demand Analysis (QPA) to efficiently determine the feasibility of task-sets.The QPA reduces the calculation effort exponentially.It starts by selecting the upper (L) and lower bound (d min ) on the value of t where L = min(L a , L b ) and d min is equal to minimum relative deadline of the task τ i ∈ Γ. Next, starting from t = L, QPA computes h(t) for t and if it is found less than t then it is replaced with h(t).This process continues as long as t reaches to d min or h(t) is found greater than t.Zhang et al. further extended their work to perform sensitivity analysis on EDF scheduling [19].

Semi-Partitioned Multi-Processor Scheduling
Anderson et al. introduced the semi-partitioned scheduling, presented in [20].They introduced the notion of task-splitting to improve the system utilization and proposed the EDF-fm algorithm for scheduling recurrent soft real-time tasks on multiprocessor systems.Under the hard dynamic priority semi-partitioned category, EDF with task splitting and K processors in a Group (EKG) is a well-known algorithm [21].EKG classifies the tasks into heavy and light.Each of the heavy tasks is assigned to a separate processor while the light tasks are sequentially allocated to the remaining processors.
In [22], Kato et al. presented the Ehd2-SIP.It assigns tasks to processors sequentially starting from the first processor.If the utilization of the current task τ i is less than or equal to the remaining capacity of the current processor P m i.e., U(τ i ) ≤ U b − U(P m ) then τ i is assigned to P m .Otherwise, it is split into two portions τ i and τ i where τ i is assigned to P m while τ i is assigned to P (m+1) .Kato et al. presented the Earliest Deadline Deferrable Portion (EDDP) algorithm in [23].EDDP first classifies the tasks as light or heavy.A task τ i is considered as heavy if: All of the remaining tasks are considered light tasks.Heavy tasks are assigned to dedicated processors while the light tasks are sequentially assigned to the remaining processors.EDDP algorithm achieves a utilization bound of 65%.Kato et al. further extend their results on semi-partitioned scheduling and proposed Earliest Deadline and Highest Priority Split (EDHS) algorithm [24].EDHS algorithm assigns global and highest static priority to migratory tasks while the priorities to fixed tasks are assigned using EDF algorithm.Earliest Deadline First with Window-constrained Migration (EDF-WM) algorithm, presented in [25], aims at improving the system schedulability with reduced context switching cost.EDF-WM algorithm assigns tasks to processors on first-fit basis.When a task is not feasible on any processors then it is split across more than one processor.
Burn et al. addressed the dynamic priority semi-partitioned scheduling of periodic tasks on identical processors and proposed the C=D heuristic for task-splitting [26].When a task τ i is required to be split into sub-tasks τ i and τ i , the deadline of τ i is set equal to its worst-case execution time.In this way, τ i always has the highest priority on its assigned core.This reduces the task-splitting penalty.
In [27], Anderson et al. extends the EDF-fm algorithm presented in [20] and proposed Earliest Deadline First with Optimal Semi-partitioned (EDF-os) scheduling algorithm.EDF-fm algorithm restricts the migrating tasks to have utilization less than 0.5.It assigns high-priority to migrating tasks while the priorities to fixed tasks are assigned in EDF manner.

Power Asymmetric Multiprocessor Scheduling
The problem of power asymmetric multiprocessor scheduling was first addressed in [28].In [28], Baruah studied the dynamic priority scheduling of periodic real-time tasks on power asymmetric multicore processors with integer boundary constraint on task preemptions and showed that the general problem is intractable.
In [29], Funk et al. has provided an online algorithm based on global EDF scheduling.They have derived sufficient condition to determine the EDF feasibility of a set of tasks on a power asymmetric processor provided that this task set is known to be feasible on some different power asymmetric processor.This algorithm suffers from high runtime cost due to task migrations.
In [30], the fixed priority real-time scheduling of periodic tasks on power asymmetric processors is considered and a sufficient test to determine the feasibility of a set of tasks is derived.This test works well for task sets with low utilization but fails to determine schedulability of high utilizations task sets.Andersson et al. has addressed the realtime scheduling problem of sporadic tasks and proposed a partitioned dynamic priority algorithm named EDF-DU-IS-FF [31].The EDF-DU-IS-FF algorithm partitions the task set based on the processor capacity and at runtime; these partitioned tasks are executed in EDF fashion.The same authors have discussed the fixed priority scheduling of sporadic tasks in [32] and proposed the RM-DU-IS-FF algorithm.This algorithm first partitions the task set using the L.L bound and then these tasks are executed in RM fashion.This algorithm fails to fully utilize the processor capacity due to the usage of sufficient test during partitioning stage and as a result does not perform well at higher system utilization levels.
In [33], Cucu et al. has shown that, according to the global fixed-priority scheduling, any schedule of asynchronous periodic task sets that is feasible on a power asymmetric processor becomes periodic after a specific moment in time.They have determined that point and have provided a feasibility interval for such systems.In [34], a sufficient test for global EDF scheduling of sporadic task system on a power asymmetric multicore processor is presented.The scheduling of soft real-time periodic tasks on power asymmetric multicore processor is studied in [35].The authors have argued that there are deficiencies in the Linux system for supporting real-time periodic tasks.They discussed the way to provide better performance for the soft workload in the presence of hard workload using deferrable servers.
Chen et al. presented the online-scheduling algorithms PG and PCG for scheduling periodic tasks on a power asymmetric multicore processor [36].These algorithms assign tasks with largest remaining execution time to the fastest processor.PG and PCG algorithms incur high runtime cost in the form of context switches and task migrations.In [37], A-S algorithm is proposed as an improvement over the PCG algorithm to reduce the runtime costs.A-S algorithm reduces the task preemptions and task migrations up to 90% and 87% respectively.Risat et al. has discussed the RM based global scheduling of periodic implicitdeadline tasks in [38].They have provided a set of schedulability conditions based on easily computable task-set parameters for providing better utilization along with maintaining the feasibility.They have showed that their conditions provide better performance than other counterparts.Jung et al. studied the scheduling of harmonic real-time tasks in [39].They have proposed a RM based partitioned approach.They first partition the task set based on processor capacity using the harmonic bound and then split the remaining tasks if required.

System and Task Models
We have considered a power asymmetric multicore processor having m cores.The processing power of these cores is defined by the set S = {S 1 , S 2 , . . . ,S m }, where S i is the processing power of the ith core and S i = S j or S i = S j ∀ i, j ∈ S. The total processing power of the system can be calculated by S = m ∑ i=1 S i .
We have considered the standard real-time task model to characterize the workload of the system.The system workload is represented by the set Γ, which consists of n real-time periodic tasks.Each task τ i is characterized by its worst-case execution requirements, i.e., the number of CPU cycles required by τ i to complete its execution in the worst-case (C i ); its minimum inter-arrival time (P i ), i.e., its period; and its relative deadline (D i ).The time instant at which the first job of a task is released is known as its phase and is denoted by φ i .We assume that all of the tasks in Γ follow the implicit deadline model, i.e., P i = D i ∀ i ∈ Γ.In addition, we assume that all of the tasks are synchronous, i.e., φ i = 0 ∀ i ∈ Γ.
Furthermore, we have considered another task parameter, i.e., the additional offset.The additional offset of a task τ i denoted by δ i is the amount of time for which τ i remains blocked and is not considered for scheduling, i.e., during the time interval r i + δ i (where r i is the release time of τ i ) remains blocked.Initially, the additional offset of all of the tasks is zero, i.e., δ i = 0 ∀ i ∈ Γ.Thus, we can define a task τ i by a 4-tuple (δ i , C i , P i , D i ).Typical examples of this kind of workload include sensory data acquisition system, air traffic control systems, environment monitoring system, etc.These systems have to periodically execute certain tasks and the completion of those tasks within a specified time is mandatory.Any delay in the completion of these tasks may have catastrophic consequences.
Due to the difference in processing speed of different cores, a task takes a different amount of time on different cores to complete its execution.The time a task τ i ∈ Γ takes to complete its execution on core j ∈ S can be calculated by Equation (7): The fraction of the processor time required by a task to complete its execution is known as its system utilization factor.The system utilization factor of a task τ i ∈ Γ on core j ∈ S is calculated using the Equation (8): While the system utilization factor of a task τ i ∈ Γ on the processor S is given by Equation ( 9): Now, the total its system utilization factor of the task set Γ is given by Equation (10): A task set Γ can only be considered for scheduling if U(Γ) ≤ 1 and any task set with U(Γ) > 1 can never be feasible with any scheduling algorithm.The used notations are given in Table 1.Period of the task i S Set defining the power asymmetric multicore processor S i Processing power of the core i T(i, j) Time required by task i to complete its execution on core j U(i, j) System utilization factor of the task i on core j Θ(n) Value of the L.L bound for n tasks δ i Additional-offset of task i

Dynamic Priority Semi-Partitioned Scheduling of Real-Time Tasks on Power-Asymmetric Multicore Processors
In this section, we present our dynamic-priority algorithm named EDFwC=D-TS (EDF Scheduling with C=D task-splitting) for scheduling real-time tasks on power-asymmetric multicore processors.EDFwC=D-TS works in two phases: task-allocation and scheduling.In task-allocation phase, the given workload is distributed among the processor cores, while in scheduling phase, tasks are scheduled using the EDF algorithm on each core.In the following sections, we discuss the EDFwC=D-TS scheduling algorithm in detail.

Task-Allocation in EDFwC=D-TS Scheduling
In this section, we discuss the task-allocation phase under the EDFwC=D-TS scheduling.Task-allocation deals with the mapping of tasks to cores.In task-allocation, a given set of tasks Γ is divided into at most M (M is the number of cores in S) subsets such that each subset Γ i is EDF-feasible on the corresponding core S i i.e., U(Γ i ) ≤ 1. Task-allocation in EDFwC=D-TS assigns tasks to a core with the provision that its capacity is fully utilized.When the capacity of a core gets exceeded due to the assignment of a task, a task is split over multiple cores using the C=D heuristic.
Task-allocations scheme under EDFwC=D-TS scheduling is given by Algorithm 1.In Algorithm 1, it is assumed that tasks in Γ are sorted in descending order of their utilization, i.e., U i > U j ∀i < j.Furthermore, it is also assumed that the processor cores are sorted in descending order of their speed, i.e., S i > S j ∀i < j.
Algorithm 1 starts by assigning the tasks to the first core, i.e., the fastest core.It first calculates the system utilization of the next task τ i on the current core m i.e., U(τ i , m) and then adds it to the total system utilization of the tasks which are already assigned to the core m i.e., U(Γ m ); here Γ m represents the set of tasks which are already assigned to the core m (Line 3-4).Now, if after adding U(τ i , m) the total system utilization of Γ m i.e., U(Γ m ) remains less than the EDF-bound that is 1, then τ i is removed from Γ and it is added in Γ m (Lines 5-7).Similarly, τ i is also removed from Γ and is assigned to Γ m in cases when the addition of U(τ i , m) to U(Γ m ) makes it equal to 1. Furthermore, U(Γ m ) = 1 means that the capacity of the core m is fully utilized and therefore the next core should be considered for the allocation of remaining tasks.For this the index of current core, i.e., m is increased by 1 (Line 8-11).
In cases when the addition of U(τ i , m) to U(Γ m ) results in U(Γ m ) getting greater than 1, i.e., the core's capacity gets exceeded, then Algorithm 1 examines the remaining unassigned tasks in Γ to find a task that can be accommodated on core m, i.e., a task τ i with U(τ i , m) ≤ 1 − U(Γ m ) (Lines 12-27).If no such unassigned task exist in Γ then the last task in Γ, i.e., τ |Γ| , is added in Γ m and task-splitting is performed (Lines [28][29][30][31].This means that a task τ i from Γ m is selected for splitting into two subtasks τ i and τ i such that τ i is assigned to core m while τ i is assigned to some other core k where k > m.The task-splitting under EDFwC=D-TS is discussed in Section 4.2.In the end, Algorithm 1 decides the feasibility of task-set (Lines [34][35][36][37][38].This decision is made on the number of cores used.If the number of used cores is greater than the total number of cores, i.e., M, then the task-set is declared not feasible.

Splitting Tasks in EDFwC=D-TS Scheduling
In this section, we discuss the task-splitting under the EDFwC=D-TS scheduling.Task-splitting chooses a task τ i ∈ Γ m to be split into two subtasks τ i and τ i such that τ i is feasible on core m, i.e., U(Γ m ) + U( τ i ˆ ) − U(τ i ) ≤ 1. τ i is allocated to the core m while τ i is allocated to the slowest core where it is feasible.For this purpose, the feasibility of τ i is tested on S M , i.e., the core having lowest processing power.If the system utilization of τ i exceeds than the residual capacity of the S M then the next core, i.e., S M−1 is considered.This process continues unless τ i is feasibly assigned to some core.This indexing of cores is important to reduce the task-splitting penalty (as shown in Theorem 4).
To make the choice of the task to split, each task τ i ∈ Γ m is split one after the other in increasing order of their deadlines using C=D heuristic and its feasibility is determined using the QPA algorithm until a feasible task is found.This process is performed in the following steps:

•
A task τ i : (0, C i , P i , D i ) ∈ Γ m such that it has the minimum deadline among the tasks in Γ m and is not previously tested is selected In Γ m , τ i is replaced by τ i and its feasibility is determined using QPA algorithm

•
If Γ m is feasible then it is allocated to the core m otherwise the next task is tested in the similar way

•
If no task in Γ m is found feasible then for each task τ i the worst-case execution time of τ i is reduced and tested again • This process continuous until a feasible task is found or the worst-case execution time of τ i for each task becomes equal to zero Algorithm 1 Task-Allocation under the EDFwC=D-TS Algorithm Input: (i) Set of n real-time implicit-deadline periodic tasks Γ = {τ 1 , τ 2 , . . . ,τ n } where tasks are sorted in descending order of their utilization i.e., U i > U j ∀i < j. (ii) Single-ISA heterogeneous multicore processor S = {S 1 , S 2 , . . . ,S M } having M processing cores where processing cores are sorted in descending order of their processing speed i.e., S i > S j ∀i < j.
Output: Assigns tasks to cores; and returns success if Γ is feasible on S End If 27: End For 28: return ("Allocation is Success f ul") 36: Else 37: return ("Allocation Failed")

38: End If
The task-splitting process used in EDFwC=D-TS is given in Algorithm 2. Since the total system utilization of tasks assigned to core m is greater than its capacity, i.e., U(Γ m ) > 1, Algorithm 2 first computes the fraction of U(Γ m ) by which it exceeds the core's capacity (Line 2).Furthermore, due to the reason that the suitability of tasks for splitting is determined in ascending order of their deadlines, consequently tasks in Γ m are sorted in ascending order of their deadlines (Line 3).In the next step, for each task the fraction of worst-case execution requirements (C i ) that can be allocated to core m given that this task is chosen for splitting is computed (Line 4-6).
Once these basic computations are made, Algorithm 2 splits each task τ i ∈ Γ m into two subtasks, τ i and τ i , one by one and determines their feasibility using the QPA Algorithm (Lines 9-22).To support the varying processing power of cores, the processor demand function is modified as given in Equation (11).
when the splitting of some task τ i is found feasible on core m then τ i is replaced with τ i in Γ m and Γ m is allocated to the core m.If none of the tasks is found feasible for splitting then the worst-case execution requirements (C i ) of tasks is reduced using the recursive function presented in [16] (Lines 23-30).However, to support varying processing speed of cores some basic modifications are required as given in Equation ( 12). where and For the reduced C i of each task, the feasibility of task-splitting is determined again until some feasible task-splitting is found.If the reduced C i for all tasks reaches to 0 then the task-splitting is failed.In this case, the minimum utilization task τ i ∈ Γ m is removed from Γ m and added to the set of unassigned tasks Γ (Lines 32-34).If the task-splitting is successful, i.e., a task τ i ∈ Γ m is split into two subtasks τ i and τ i then τ i , defined by C i /(S m ), C i − C i , D i − C i , P i , is assigned to the slowest core where it is feasible (Lines 35-46).Here, it can be seen that the additional offset of τ i is increased from 0 to C i S¬ m .This ensures that τ i and τ i never execute simultaneously.Finally, Algorithm 2 returns Γ m (Line 47).

Analysis of Task-Splitting in EDFwC=D-TS Scheduling
Usually, partitioned scheduling fails to fully utilize the processor capacity.Consider the allocation of tasks to the core m.Suppose, Γ m is the set of tasks that are already assigned to the core m.Further assume that none of the unassigned tasks can be assigned entirely to core m then 1 − U(Γ m ) capacity of the core m will remain unused.Similarly, the total wasted capacity on the processor S denoted by λ(S) can be calculated as given below: Semi-partitioned scheduling aims at using this wasted processor capacity to improve the system schedulability.In EDFwC=D-TS scheduling, during allocation of tasks to the core m, when τ i is split the gained advantage can be written as: sort_increasing(Γ m , Deadline) // sort tasks in Γ m ; in increasing order of deadline 4: End For 7: While f lag == f alse Do 8: f lag zero = f lase 9: If C i = 0 Then 13: f lag zero = true 14: End If 21: End If

22:
End For 23: If result = f easible AND f lag zero == true Then 24: End For 27: ElseIf f lag zero == f alse Then 28: f lag == true 29: End ElseIf 30: End If 31: End While 32: If result = f easible Then 33: Assing f lag = f alse 37: Assing f lag = true 44: End If 45: End While 47: If Assing f lag = f alse Then 48: return "Task set is not feasible" 45: End If 46: End If 47: return Γ m Task-splitting has non-zero penalty on the system.It is incurred in the form of increased system utilization of the split task.Suppose a task τ i ∈ Γ m is split into two subtasks τ i : 0, C i , C i /S m , P i and τ i : C i /S m , C i − C i , D i − C i /S m , P i , then the utilization of split tasks is always higher than τ i .From now onwards we call it the task splitting penalty and it is given by Inequality 13: If we replace U τ i , U τ i , and U(τ i ) with their actual value then the task-splitting penalty (ρ(τ i )) can be written as given in equality 14: Now, if the advantage of task-split always remains greater than its overhead then we can say that task-splitting benefits the system.This condition is given by Inequality 15: In the following, we prove that task-splitting always benefits the system.
Theorem 1.The advantage gained due to task-splitting in EDFwC=D-TS scheduling is always greater than its overhead on the system.
Proof of Theorem 1.We assume that currently tasks are being assigned to the core m.Furthermore, we assume that τ i is selected to split into two subtasks, τ i and τ i , and τ i is assigned to the core m.To proof Theorem 1, we have to show that: As in implicit-deadline task model P i = D i , the Inequality 16 can be written as: As , therefore by replacing a smaller value with a larger value: therefore it is proved that the advantage gained due to the task- splitting remains always greater than its overhead on the system.Now, we show that the task-splitting under EDFwC=D-TS satisfies the necessary task-splitting condition, i.e., the split tasks can never execute simultaneously.
Theorem 2. Given a task τ i , split into two subtasks τ i : (0, C i , C i /S m , P i ) and τ i : If τ i is assigned to core m where m < M and τ i is assigned to some other core k where m < k ≤ M then τ i and τ i can never execute simultaneously.
Proof of Theorem 2. Since the task-splitting under EDFwC=D-TS assigns an additionaloffset equal to C i /S m to τ i , τ i remains in blocked state for C i /S m for an amount of time after its release.To show that τ i and τ i can never execute simultaneously, we have to prove that τ i is always completed before the additional-offset of τ i , i.e., the worst-case response time of τ i always remains less than or equal to the additional-offset of τ i .It can be written as: The worst-case response time of a task is equal to the sum of its execution time and the interference from high priority tasks before its completion.Since τ i has the highest priority, therefore its worst-case response time always remains equal to its worst-case execution time, i.e., By comparing Equations ( 18) and (19), it is clear that the necessary task-splitting condition is satisfied in EDFwC=D-TS scheduling.
In Theorem 2, we have showed that due to the assignment of additional offset δ i to the split task, τ i ensures that τ i and τ i never execute simultaneously, since, during δ i time interval, τ i remains blocked.Therefore, it is required to ensure that it does hurt its deadline.We prove this property in Theorem 3.
Theorem 3. Given a task τ i , split into two subtasks τ i : (0, C i , C i /S m , P i ) and τ i : If τ i is assigned to core m where m < M and τ i is assigned to some other core k where m < k ≤ M then the assignment of additional offset (δ i ) to τ i does not affect its schedulability.
Proof of Theorem 3. We assume that a task τ i is split into two subtasks, τ i : 0, C i , C i /S m , P i and τ i : Further assume that τ i is allocated to core m while τ i is allocated to some other core k where m < k ≤ M. Since, τ i is feasible on core k, therefore: Symmetry 2021, 13, 1488 13 of 26 Now, to show that τ i remains schedulable after the assignment of additional offset (δ i ) we have to prove that: By comparing Inequalities 20 and 21 it is proved that the assignment of additional offset to τ i does not affect its deadline.
After proving that the task-splitting under EDFwC=D-TS maintains operational accuracy, now we show that the indexing of processor cores used in EDFwC=D-TS task-splitting reduces the task-splitting penalty.
Theorem 4. Given a Single-ISA heterogeneous multicore processor S having M cores, the indexing of cores in descending order of their processing speed reduces the task-splitting penalty.
Proof of Theorem 4. Suppose the task τ i is split into two subtasks τ i : 0, C i , C i /S m , P i and τ i : S m , P i using Algorithm 2. Further assume that τ i is assigned to core m where m < M and τ i is assigned to some other core k where m < k ≤ M. From ( 14), the task-splitting penalty is given by: Given a single-ISA heterogeneous multicore processor S has M cores, the indexing of cores in descending order of their processing speed reduces the task-splitting penalty.
That is, ρ(τ i ) time on core k is wasted due to task-splitting.Now, assume that S k is the processing speed of k th core when processor cores are sorted in descending order of their processing speeds; while S * k is its processing speed when processor cores are sorted in ascending order of their processing speeds then: Now, the ρ(τ i ) when the processor cores are sorted in descending order of their processing speeds is given by (23): Furthermore, the ρ(τ i ) when the processor cores are sorted in ascending order of their processing speeds is given by (24): Since S * k ≥ S k , therefore by comparing ( 20) and ( 21) it is proved that the task-splitting penalty is lower when processor cores are sorted in descending order of their processing speed.
Task-splitting under EDFwC=D-TS assigns the τ i to the lowest index core where it is feasible.We can show by using the Theorem 4 that this approach helps to minimize the task-splitting overhead.We prove this in Theorem 5.
Theorem 5. Given a task τ i is split into two subtasks τ i : 0, C i , C i /S m , P i and τ i : (C i /S m , C i −C i , D i − C i /S m , P i ) using Algorithm 2, if τ i is assigned to core m where m < M and τ i is assigned to some other core k where m < k ≤ M then the task-splitting penalty is minimized if k = M provided that the processor cores are indexed in descending order of their processing speed.

Proof of Theorem 5. Assume that the task τ
Further assume that τ i is assigned to the core m where m < M and τ i is assigned to some other core k where m < k ≤ M. From ( 14), the time wasted on core k due to task-splitting is given by (25): As the processor cores are sorted in descending order of their processing speed, therefore: min From ( 13): By comparing ( 25) and ( 27) it is clear that ρ(τ i ) keeps on increasing as S k increases.Furthermore, (26) shows that S k decreases as k → M and it is minimum when k = M. Therefore, it can be concluded that ρ(τ i ) decreases as k → M and it is minimum when k = M.

The EDFwC=D-TS Scheduling Algorithm
After discussing the task-allocation, now we present the EDFwC=D-TS scheduling algorithm.In EDFwC=D-TS scheduling given by Algorithm 3, the given workload is first distributed among the processor cores using the Algorithm 1 (Lines 1-5).On each core j, initial task priorities are assigned using the EDF algorithm (Lines 6-7).At time t = 0, the highest priority task is executed on each core (Lines 9-10).Whenever a task is completed, the highest priority ready task is executed next (Lines 11-12).Similarly, when a task is released its absolute deadline is compared with the absolute deadline of the currently executing task and the task with earliest absolute deadline is selected for execution (Lines 13-16).

Working Example
In this section, in order to illustrate the working of EDFwC=D-TS algorithm, we apply it on an example task-set.We have assumed a power asymmetric multicore processor having 3 cores, defined by the set S = {2.0GHz, 1.5 GHz, 1.0 GHz}.The workload consists of 10 synchronous periodic implicit-deadline real-time tasks given by Table 2.

End If
As EDFwC=D-TS algorithm assumes that processor cores are sorted in descending order of their speeds, therefore tasks are first allocated to the core having highest processing speed, i.e., 2.0 GHz.Initially, the utilization of core 1 is set to 0, i.e., U 1 = 0.The system utilization factor of τ 1 on core 1 is 0.3333 (calculation is given below): After allocating τ 1 to core 1, its utilization becomes 0.3333 (U 1 = 0 + 0.3333).Since U 1 < 1, therefore, τ 1 is feasibly allocated to core 1.Next, the τ 2 is considered for allocation on core 1.Its system utilization factor on core 1 is 0.3.After allocating τ 2 to core 1, U 1 turns out to be 0.6333 (U 1 = 0.3333 + 0.3 = 0.6333).Since U 1 is still less than the capacity of the core 1, i.e., 1, therefore τ 2 is also feasible on core 1.Subsequently, τ 3 is assigned to core 1 after τ 2 .The system utilization factor of τ 3 on core 1 is 0.25 and after allocating τ 3 to core 1 U 1 grows to 0.8833.The next task considered for allocation to the core 1 is τ 4 .The system utilization factor of τ 4 on core 1 is 0.25.After allocating τ 4 to core 1, its utilization becomes 0.1333.Since U 1 gets greater than the capacity of core 1, therefore τ 4 is not feasible on core 1.Now, the next task, i.e., τ 5 , is considered for allocation to core 1.The utilization factor of τ 5 on core 1 is 0.225 and if this task is allocated to core 1 then U 1 becomes 1.1083.Therefore, τ 5 is also not feasible on core 1.We continue to test the feasibility of remaining tasks.Since none of the remaining tasks, i.e., from τ 6 to τ 10 , is feasible on core 1, therefore τ 10 which has the lowest utilization factor, is added to Γ 1 and task splitting is performed.
Task-splitting under EDFwC=D-TS scheduling (given by Algorithm 2) selects a task from Γ 1 that is most suitable for splitting, i.e., maximizes the U 1 without hurting the system feasibility.The suitability of tasks for splitting is determined in decreasing order of their deadlines.Since τ 10 has the least deadline among the tasks in Γ 1 , therefore its suitability for splitting is determined first.For this, τ 10 is split into two subtasks: τ 10 and τ 10 such that U Γ 1 − τ 2 = 1.Subtasks of τ 10 are: τ 10 : 0, 0.932272 × 10 5 , 0.466136, 4 and τ 10 : 0.466136, 0.067728 × 10 5 , 3.533864, 4 .Now, τ 10 is replaced with τ 10 in Γ 1 and the feasibility of Γ 1 is determined using the QPA Algorithm.
The QPA Algorithm determines the feasibility of Γ 1 in the following steps:

•
The value of L is calculated using Equation (3); L = 59.99204

•
Value of d min is set to 0.466136

•
The initial value of t is assigned using t = max{d_i |d_i < L}; t = 56.466136Value of h(t) is calculated against each value of t (calculations are given in Table 3).Since at t = 4.432272 the value of h(t) is 0.466136 and t < d min , it leads to the conclusion that Γ 1 = τ 1 , τ 2 , τ 3 , τ 10 is feasible on core 1.Now, τ 10 is assigned to the slowest core where it is feasible.Since core 3 is the slowest core and currently no task is assigned to it, therefore U 3 = 0.The system utilization factor of τ 10 on core 3 is 0.016.Therefore, it is feasible on core 3. Similarly, the remaining tasks are allocated to core 2 and 3.The final task allocation is given in Table 4.
The Gantt chart showing the execution of tasks on core 2 is given in Figure 2 while is Figure 3 shows the execution of tasks on core 3.
1.9666136,  , is completed and  is started.The WCET of  on core 1 is 2; therefore, it is completed at  3.9666136. is the only ready task at  3.9666136, therefore its first job is started.At  4, second job of  ( , ) is released.Since,  , has earlier absolute deadline, i.e., 8 than currently executing task  , therefore,  , is preempted and  , is executed.The execution of tasks continues in the similar way.The Gantt chart showing the execution of tasks on core 1 during the first hyper-period is given in Figure 1.The Gantt chart showing the execution of tasks on core 2 is given in Figure 2 while is Figure 3 shows the execution of tasks on core 3.  first job is started.At  4, second job of  ( , ) is released.Since,  , has earlier absolute deadline, i.e., 8 than currently executing task  , therefore,  , is preempted and  , is executed.The execution of tasks continues in the similar way.The Gantt chart showing the execution of tasks on core 1 during the first hyper-period is given in Figure 1.The Gantt chart showing the execution of tasks on core 2 is given in Figure 2 while is Figure 3 shows the execution of tasks on core 3.

Experimental Evaluation
In this section, we have evaluated the effectiveness of EDFwC=D-TS algorithm through.We have developed our simulator in Java.Our simulator generates synthetic task-sets using the UUniFast algorithm [40] and then performs required analysis on these task-sets.Detail of experimental set-up and performed analysis is given in subsequent

Experimental Evaluation
In this section, we have evaluated the effectiveness of EDFwC=D-TS algorithm through.We have developed our simulator in Java.Our simulator generates synthetic task-sets using the UUniFast algorithm [40] and then performs required analysis on these task-sets.Detail of experimental set-up and performed analysis is given in subsequent sections.We have measured the performance of EDFwC=D-TS algorithm through simulations on randomly generated synthetic task-sets.We have generated a total of 10 5 task-sets for each experiment.Each task-set contained 8 to 64 tasks, i.e., |Γ| ∈ [8 − 64].Task parameters are set as follows: Values of task parameter are set in such a way that the total system utilization factor remained between 0.90 and 1.0, i.e., U ∈ {0.85, . . . , 1.0}.To execute these tasks, we have considered a power asymmetric multicore processor.First, we have executed the tasks on 2-core processor and then the experiments are repeated on 4-core, 6-core, and 8-core processors.The task-set and system parameters are summarized in Table 5.

Comparing Algorithms
The performance of the EDFwC=D-TS algorithm is compared with the following algorithms

•
EDF-Partitioned: assigns tasks to a core to its full capacity on First-Fit basis using the EDF utilization bound, i.e., U ≤ 1 • EDF-DU-IS-FF: referred to the algorithm presented in [31] that assigns tasks to cores on First-Fit basis using EDF utilization bound assuming that cores are sorted in-order of increasing speed while the tasks are arranged in order of decreasing utilization.

Metrics Used for Comparison
The performance of the above-mentioned algorithms is compared on the basis of following matrices

•
Processor Utilization: refers to the ability of an algorithm to utilize the capacity of available processor cores

•
No. of cores used: is the number of cores used to feasibly schedule the provided workload • Schedulability: is the ability of an algorithm to feasibly schedule the workload using the available processing cores

Simulation Results
This segment presents the experiments, and obtained results, conducted to evaluate the performance of EDFwC=D-TS and its counterparts.

Processor Utilization
In this experiment, we have measured the ability of the aforementioned algorithms to utilize the processor capacity.First, we have evaluated these algorithms on a 2-core processor defined by S = {1.01GHz, 3.1 GHz}.We have generated 25,000 task-sets as per the above-mentioned parameters.Total system utilization of these task-sets is kept between 0.90 and 1.0, i.e., U(Γ) ∈ [0.90, . . . , 1.0] ∀Γ as per S. Each task-set consisted of 4 to 16 tasks.For each task-set, we have determined the schedulability of these task-sets on S using each of the algorithms.If Γ is not found schedulable under certain algorithm then we have used an extra core with processing power of 1.53 GHz to make it schedulable.In the end, we calculated the average of system utilization of the work assigned to each core using the following formula: For the purpose of comparison, we have calculated the average processor utilization under each algorithm.We have repeated the same experiment for 4-core, 6-core, and 8-core processors.The obtained results are shown in Figure 4.For the purpose of comparison, we have calculated the average processor utilization under each algorithm.We have repeated the same experiment for 4-core, 6-core, and 8-core processors.The obtained results are shown in Figure 4.
Figure 4 shows that EDFwC=D-TS algorithm utilized the 99% of the processor capacity on 2-core processor while EDF-DU-IS-FF and EDF-Partitioned algorithms utilized 93% and 91% processor capacity respectively.This shows that EDFwC=D-TS algorithm utilized up to 6% and 8% more processor capacity as compared to EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.Similar dominance of EDFwC=D-TS over its counterparts on 4-core, 6-core, and 8-core processors is obvious where it utilized up-to 9% more processor capacity.

Number of Cores Used
In this experiment, we have compared the EDFwC=D-TS, EDF-DU-IS-FF, and EDF-Partitioned algorithms on the basis of number of cores required by each algorithm to feasibly schedule the given workload.First, we have generated 25,000 task-sets having system utilization 1.0, i.e.,  Г 1.0 ∀ Г on a 2-core processor defined by  1.01 GHz, 3.1 GHz .Due to the heavy workload, none of the algorithms guarantee to feasibly schedule all of the task-sets using processing cores provided by S. Therefore, we assume that the S has an extra core having speed of 1.53 GHz that will be used only if the task-set is not schedulable with two cores.
Considering this set-up, we have recorded the number of cores required by each algorithm to feasibly schedule a task-set on S. To make a comparison, we have calculated the average of the cores used by each algorithm using the following formula: The same experiment is repeated for 4-core, 6-core, and 8-core processors.The obtained results are shown in Figure 5.In Figure 5, the number of processing cores used to determine the workload is given on the x-axis while the average number of cores used by each algorithm is given on the y-axis.An algorithm is considered better if the average  Figure 4 shows that EDFwC=D-TS algorithm utilized the 99% of the processor capacity on 2-core processor while EDF-DU-IS-FF and EDF-Partitioned algorithms utilized 93% and 91% processor capacity respectively.This shows that EDFwC=D-TS algorithm utilized up to 6% and 8% more processor capacity as compared to EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.Similar dominance of EDFwC=D-TS over its counterparts on 4-core, 6-core, and 8-core processors is obvious where it utilized up-to 9% more processor capacity.

Number of Cores Used
In this experiment, we have compared the EDFwC=D-TS, EDF-DU-IS-FF, and EDF-Partitioned algorithms on the basis of number of cores required by each algorithm to feasibly schedule the given workload.First, we have generated 25,000 task-sets having system utilization 1.0, i.e., U(Γ) = 1.0 ∀ Γ on a 2-core processor defined by S = {1.01GHz, 3.1 GHz}.Due to the heavy workload, none of the algorithms guarantee to feasibly schedule all of the task-sets using processing cores provided by S. Therefore, we assume that the S has an extra core having speed of 1.53 GHz that will be used only if the task-set is not schedulable with two cores.
Considering this set-up, we have recorded the number of cores required by each algorithm to feasibly schedule a task-set on S. To make a comparison, we have calculated the average of the cores used by each algorithm using the following formula: The same experiment is repeated for 4-core, 6-core, and 8-core processors.The obtained results are shown in Figure 5.In Figure 5, the number of processing cores used to determine the workload is given on the x-axis while the average number of cores used by each algorithm is given on the y-axis.An algorithm is considered better if the average number of cores used by it remains close to the total number of cores used to define the workload.For 2-cores, the EDFwC=D-TS used on average 2.10 cores while EDF-DU-IS-FF and EDF-Partitioned algorithms use 2.34 and 2.42 cores respectively.This shows that EDFwC=D-TS uses 14% less cores than EDF-DU-IS-FF while 19% less cores than EDF-Partitioned algorithm.Similarly, it is easy to observe that EDFwC=D-TS outperforms its counterparts on 4-core, 6-core, and 8-core processors and uses up to 13%, 8%, and 9% fewer cores than EDF-DU-IS-FF algorithm respectively while it uses 16%, 9%, and 11% fewer cores than EDF-partitioned algorithm.and 9% fewer cores than EDF-DU-IS-FF algorithm respectively while it uses 16%, 9%, and 11% fewer cores than EDF-partitioned algorithm.

Schedulability
Schedulability is major metric used to compare the performance of real-time scheduling algorithms.This is defined as the ability of algorithms to schedule task-sets feasibly.In this experiment, we have compared EDFwC=D-TS, EDF-DU-IS-FF, EDF-Partitioned in schedulability perspective.To begin with, we assess the performance of these algorithms on 4-core power-asymmetric multicore processor defined by the set  1.01 GHz, 1.53 GHz, 2.1 GHz, 3.1 GHz .We have generated 10 5 synthetic task-sets.Each task-set contained 16 to 32 tasks.Task parameters were set as per the criteria defined in Table 5.The system utilization of each task-set was kept between 0.90 and 1.0.We have determined the feasibility of each task-set on S and recorded the results.For comparison, we have counted the total task-sets against each system utilization level and determined the percentage of feasible task-sets under each algorithm.The obtained results are given by Figure 6.
In Figure 6, the x-axis represents the system utilization while the y-axis represents the percentage of feasible task-sets.EDF-partitioned scheduling performed well up to 93% system utilization while its performance reduces at an increasing rate, as the system utilization gets higher.At 94-95% system utilization level it feasibly schedules 78%

Schedulability
Schedulability is major metric used to compare the performance of real-time scheduling algorithms.This is defined as the ability of algorithms to schedule task-sets feasibly.In this experiment, we have compared EDFwC=D-TS, EDF-DU-IS-FF, EDF-Partitioned in schedulability perspective.To begin with, we assess the performance of these algorithms on 4-core power-asymmetric multicore processor defined by the set S = {1.01GHz, 1.53 GHz, 2.1 GHz, 3.1 GHz}.We have generated 10 5 synthetic task-sets.Each task-set contained 16 to 32 tasks.Task parameters were set as per the criteria defined in Table 5.The system utilization of each task-set was kept between 0.90 and 1.0.We have determined the feasibility of each task-set on S and recorded the results.For comparison, we have counted the total task-sets against each system utilization level and determined the percentage of feasible task-sets under each algorithm.The obtained results are given by Figure 6.
In Figure 6, the x-axis represents the system utilization while the y-axis represents the percentage of feasible task-sets.EDF-partitioned scheduling performed well up to 93% system utilization while its performance reduces at an increasing rate, as the system utilization gets higher.At 94-95% system utilization level it feasibly schedules 78% taskssets, while at 96-97%, 98-99%, and 100% utilization levels the success rate drops up to 19%, 0%, and 0% respectively.In contrast, the EDF-DU-IS-FF algorithm schedules 100% task-sets with utilization 93% or less while at higher system utilization levels (94-95%, 96-97%, 98-99%, and 100%) its performance declines gradually and it achieves a success ratio of 86%, 27%, 3%, and 0%.At the same experimental set-up, the superior performance of EDFwC=D-TS algorithm is obvious.It successfully achieves 100% success ratio up-to 95% system utilization.However, at 96-97%, 98-99%, and 100% system utilization levels it schedules 78%, 43%, and 6% task-sets feasibly.We repeated the same experiment for 6-core processor.For this experiment, system and workload specifications are set as given in Table 5.The obtained results are shown in Figure 7.It is obvious that all of the algorithms performed well at lower system utilization levels.For higher system utilization workload, EDFwC=D-TS dominates its counterparts.It can be seen that at 94-95% system utilization, it schedules 29% and 33% more task-sets as compared to EDF-DU-IS-FF and EDF-Partitioned algorithms respectively while at 96-97%, 98-99%, and 100% system utilization levels it schedules 51%, 29%, 5% and 57%, 29%, 5% more task-sets than EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.
We further verified the dominance of EDFwC=D-TS by repeating the same experiment on 8-core processor (system and workload configuration is given in Table 5).We generated 10 5 task-sets with | Г | ∈ 32 64 following the task parameters as given in Table 5.The system utilization for these tasks-sets was kept between 0.90 and 1.0.The obtained results are given in Figure 8.It is easy to observe from Figure 8 that EDFwC=D-TS outclasses its counterparts in terms of schedulability at higher system utilization levels.The performance gain achieved is (38%, 46%, 13%, 2%) and (42%, 50%, 13%, 2%) at (94-95%, 96-97%, 98-99%, 100%) system utilization against EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.We repeated the same experiment for 6-core processor.For this experiment, system and workload specifications are set as given in Table 5.The obtained results are shown in Figure 7.It is obvious that all of algorithms performed well at lower system utilization levels.For higher system utilization workload, EDFwC=D-TS dominates its counterparts.It can be seen that at 94-95% system utilization, it schedules 29% and 33% more task-sets as compared to EDF-DU-IS-FF and EDF-Partitioned algorithms respectively while at 96-97%, 98-99%, and 100% system utilization levels it schedules 51%, 29%, 5% and 57%, 29%, 5% more task-sets than EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.
We further verified the dominance of EDFwC=D-TS by repeating the same experiment on 8-core processor (system and workload configuration is given in Table 5).We generated 10 5 task-sets with | Γ | ∈ [32 − 64] following the task parameters as given in Table 5.The system utilization for these tasks-sets was kept between 0.90 and 1.0.The obtained results are given in Figure 8.It is easy to observe from Figure 8 that EDFwC=D-TS outclasses its counterparts in terms of schedulability at higher system utilization levels.The performance gain achieved is (38%, 46%, 13%, 2%) and (42%, 50%, 13%, 2%) at (94-95%, 96-97%, 98-99%, 100%) system utilization against EDF-DU-IS-FF and EDF-Partitioned algorithms respectively.We have summarized the results in Tables 6 and 7.   We have summarized the results in Tables 6 and 7.  We have summarized the results in Tables 6 and 7.
Table 6.Average number of cores used/Average processor utilization.

Discussion of Experimental Results
In this section, we have discussed and evaluated the performance of EDFwC=D-TS algorithm.
Average Processor Utilization: It is aforementioned that partitioned scheduling usually fails to fully utilize the available processor capacity while semi-partitioned scheduling improves the processor utilization.EDFwC=D-TS algorithm expands the space for choosing the task to split.In this way a task is selected for splitting that maximizes the processor utilization.Furthermore, EDFwC=D-TS allocates the second portion of split task on slowest core where it is schedulable.We have shown that this reduces the overhead associated with existing semi-partitioned scheduling techniques.As a result, EDFwC=D-TS improves the processor utilization.We have conducted experiments to validate the superior performance of EDFwC=D-TS algorithm in terms of processor utilization.
In the first experiment, we have evaluated EDF-Partitioned, EDF-DU-IS-FF, and EDFwC=D-TS algorithms, in terms of their capabilities to utilize the processor capacity.The obtained results are given in Figure 4.It can be seen that EDFwC=D-TS achieves better processor utilization due to its better task assignment strategy as compared to other counterparts.On a 2-core processor, on average it uses 8% and 6% more processor capacity against EDF-Partitioned and EDF-DU-IS-FF algorithms respectively.Similarly, it achieves 10%, 5%, and 10% and 6%, 11%, and 8% better processor utilization on 4-core, 6-core, and 8-core processors than EDF-Partitioned and EDF-DU-IS-FF algorithms.This shows that EDFwC=D-TS dominates its counterparts in terms of processor utilization.
Average Number of Cores Used: Since EDFwC=D-TS algorithm utilizes the processor capacity in a better way as compared to its counterparts, this ability enables EDFwC=D-TS to schedule the given workload using less number of cores.While assigning tasks to mth core using partitioned scheduling if the utilization of remaining tasks is larger than the residual capacity of mth core, i.We have conducted the experiments to verify the ability of EDFwC=D-TS algorithm to schedule the given workload using less number of cores.The obtained results are given in Figure 5.To schedule heavy workload defined using two cores, in most of the cases EDFwC=D-TS successfully schedules the workload with two cores while in few cases it requires another core.As a result it uses 2.10 cores on average.Instead EDF-Partitioned and EDF-DU-IS-FF use 2.42 and 2.34 cores respectively to schedule the same workload.It shows that EDFwC=D-TS uses 13.22% less cores than EDF-Partitioned and 10.25% less cores than EDF-DU-IS-FF.Similarly, for the workload defined using 4-core, 6-core, and 8-core processors EDFwC=D-TS respectively uses 4.45%, 6.19%, and 7.8% less cores than EDF-DU-IS-FF and 8.59%, 7.39%, and 10.19% less cores than EDF-partitioned.
Schedulability: Better utilization of available processor capacity also results in better schedulability.When the given workload is not feasible using partitioned scheduling, i.e., some of the tasks cannot be assigned to any core, EDFwC=D-TS may produce feasible schedule for such workloads if the residual capacity utilized by it, is more than the system utilization of unassigned tasks, i.e., is more prone to achieve better schedulability as compared to its counterparts.We have evaluated EDF-Partitioned, EDF-DU-IS-FF, and EDFwC=D-TS algorithms through simulations to measure their abilities to feasibly schedule the given workload.On 4-core processors EDFwC=D-TS dominates its counterparts for heavy workload and schedules up-to 51% and 67% more task-sets than EDF-DU-IS-FF and EDF-Partitioned algorithms respectively for the workload having 96-97% system utilization.However, for workload with low system utilization the performance of all algorithms is comparative (See Figure 6 for detailed results).A similar trend is observed when the simulations are performed on 6-core and 8-core processors (results are given in Figures 7 and 8 respectively).EDFwC=D-TS schedules 57% and 50% more task-set than EDF-Partitioned at 96-97% system utilization on 6-core and 8-core processors while this performance gain reaches up to 51% and 46% against EDF-DU-IS-FF.

Conclusions and Future Work
This research explores the dynamic-priority semi-partitioned scheduling of power asymmetric multicore processors and presented a novel algorithm: EDFwC=D-TS.EDFwC=D-TS algorithm introduces a two-round task-allocation policy.During task allocation, first a subset of task is assigned to a core and then, in the second round, task-splitting is performed is such a way that core utilization is maximized.The empirical analysis verifies the dominance of the EDFwC=D-TS algorithm over its counterpart.The obtained simulation results reveal that it schedules up to 67% more task-sets at higher system utilization.Furthermore, EDFwC=D-TS improves the processor utilization up to 11% while it also reduces the number of cores required to feasibly schedule the given workload up to 14%.
In this work we have evaluated the EDFwC=D-TS algorithm through simulations.However, in future we aim at validating the effectiveness of EDFwC=D-TS in real environments.Furthermore, we will also integrate the DVFS and memory shut-down approaches with EDFwC=D-TS scheduling to achieve energy efficiency.Additionally, we will also study the efficacy of the proposed work for parallel task models.

Algorithm 2
Split-task process under EDFwC=D-TS SchedulingInput: (i) Processing core having processing power S m , which is currently being considered for task allocation.(ii) Set of tasks Γ m where Γ m ⊆ Γ and contains the tasks which are assigned to core m by Algorithm 1. (iii) System utilization of Γ m i.e., U(Γ m ) Output: Selects a task τ i ∈ Γ m and splits it into two subtasks.1:
Average number o f cores used = ∑ n=Total Number o f Task−sets i=1 Number o f cores used Total number o f task − sets Symmetry 2021, 13, x FOR PEER REVIEW 21 of 27
at least one more core is required to schedule the tasks.Instead, in EDFwC=D-TS if m no other core is required.Due to this reason, EDFwC=D-TS usually requires less number of cores than its counterparts to feasibly schedule the workload.

Table 1 .
Used notations and their meanings.

Table 3 .
Value of h(t) against each value of t.

Table 5 .
Task and system parameters for simulations.
i , P i , D i )

Table 6 .
Average number of cores used/Average processor utilization.

Table 6 .
Average number of cores used/Average processor utilization.

Table 7 .
Percentage of feasible task-sets.