Energy-Efﬁcient Task Partitioning for Real-Time Scheduling on Multi-Core Platforms

: AbstractMulti-core processors have become widespread computing engines for recent embedded real-time systems. Efﬁcient task partitioning plays a signiﬁcant role in real-time computing for achieving higher performance alongside sustaining system correctness and predictability and meeting all hard deadlines. This paper deals with the problem of energy-aware static partitioning of periodic, dependent real-time tasks on a homogenous multi-core platform. Concurrent access of the tasks to shared resources by multiple tasks running on different cores induced a higher blocking time, which increases the worst-case execution time (WCET) of tasks and can cause missing the hard deadlines, consequently resulting in system failure. The proposed blocking-aware-based partitioning (BABP) algorithm aims to reduce the overall energy consumption while avoiding deadline violations. Compared to existing partitioning strategies, the proposed technique achieves more energy-saving. A series of experiments test the capabilities of the suggested algorithm compared to popular heuristics partitioning algorithms. A comparison was made between the most used bin-packing algorithms and the proposed algorithm in terms of energy consumption and system schedulability. Experimental results demonstrate that the designed algorithm outperforms the Worst Fit Decreasing (WFD), Best Fit Decreasing (BFD), and Similarity-Based Partitioning (SBP) algorithms of bin-packing algorithms, reduces the energy consumption of the overall system, and improves schedulability.


Introduction
Embedded systems have become omnipresent, with the number of just mobile devices now nearly reaching the world population. Embedded systems implementations embrace, for example, home applications, pacemakers, cell phones, satellites, energy generation and distribution, industrial automation, and many other kinds of systems. The process of managing their energy consumption has become extremely challenging. Embedded systems extremely affect the layout and development restrictions of their respective surrounding systems and inversely. Some embedded systems communicate with the physical surrounding and must ensure that a certain action is carried out successfully and that it is terminated within a determined time frame. Some eminent examples of these devices are airbags in cars, medical pacemakers, and autopilots in airplanes, and they are called real-time embedded systems.
Multi-core processors are now the current architecture for recent real-time embedded systems. To achieve both efficiency and speed, CPU architectures have evolved multicore processor units in which two or more processors have been used to perform a task. Multi-core technology provided better response times when running massive applications, improved power management, and provided faster execution times. Multi-core processors are specially designed to run tasks in parallel. Parallelism can be at two levels in multi-core The proposed algorithm partitions a collection of real-time tasks on a non-ideal DVS processor of a multi-core architecture. According to DVFS methods, the BABP uses a Two-Speed Strategy (TSS)-based approach known as the Dual-Speed (DS) algorithm [10], which is initially used to carry out tasks at a low level of speed and then shifts to a high-level speed immediately when the tasks are blocked.
Partitioned Earliest-Deadline-First (P-EDF) [3] is used as the dynamic priority taskscheduling strategy for each processing core of a multicore system. Upon considering dependent real-time tasks, the BABP algorithm uses the Multiprocessor Stack Resource Policy (MSRP) [11] to synchronize the access of tasks to shared resources. By using MSRP, a limited blocking time is ensured for tasks when accessing the global resources, and local resources are synchronized using SRP. When using the P-EDF algorithm to schedule tasks [3], the DS algorithm computes the low level of speed and the high level of speed based on the EDF-sufficient condition of schedulability. Therefore, while energy consumption is decreased, the timing restrictions of tasks can be guaranteed. Particularly, when tasks arrive, the DS algorithm allocates the low-speed level for executing them, while at the moment the tasks are blocked, the processor speed will shift to the high-speed level. With the DS algorithm, a high-speed interval begins when the blocking starts and terminates at the blocking task deadline. The capabilities of the proposed approach were appraised by using a simulation platform named the multi-core real-time scheduling simulator (MCRTsim) [12].
The key contributions of this study are: (1) A BABP heuristic algorithm is proposed to effectively exploit the available parallelism, balance the workload in these multi-core systems, and assign tasks which can run in parallel to different cores as much as possible. For example, as shown in Figure 1, the tasks τ1, τ5, and τ7 can be dispatched to one core and the others to another core. (2) The suggested algorithm is implemented with a simulation platform called MCRTsim. (3) An assessment of the suggested algorithm in conjunction with the blocking-agnostic bin-backing partitioning algorithm and the (SBP) algorithm, as a reference, is done. Within the framework of this study, the blocking-agnostic algorithm points to a bin-packing algorithm that does not include blocking parameters to improve the efficiency of partitioning while the schedulability check comprises blocking times. In particular, this research presumes that tasks are periodic, preemptive (only of non-critical sections), and dependent because of the synchronous access to shared resources. By using the BABP algorithm as a partitioning strategy and P-EDF as a scheduling algorithm, the simulation results indicate that the BABP algorithm achieves more energy savings than other partitioning techniques.
The remainder of this paper is arranged as set out below. Section 2 sums up the previous research on real-time systems scheduling and synchronization with a uniprocessor or a multi-core processor. Section 3 depicts the system model and problem formulation. Section 4 discusses the proposed BABP algorithm and its implementation, with the schedulability analysis. Section 5 reports on the simulation assessment and outcomes analysis. The conclusion is reported in Section 6.

Related Work
Many studies have focused, within recent years, on energy-aware scheduling of embedded systems in real time. In uniprocessor environments, there are several research papers in the domain of energy-aware scheduling of independent real-time tasks, and an extensive survey can be found in [1]. Very little research has discussed the problem of dependent real-time tasks within the context of task synchronization [13,14]. The DVFS mechanism that works to slow the processing speed is a widely used energy-saving technique due to the convexity of the power consumption function [15][16][17].
The interest in multiprocessor techniques has increased as a result of the growth in multi-core architectures. The article [18] tackled the problem of energy-aware static partitioning of periodic real-time tasks on asymmetric multiprocessor (multi-core) embedded systems. It formulated the problem according to the platform-supported DVFS model and outlined optimal methods of reference partitioning for each case of the DVFS model.

Related Work
Many studies have focused, within recent years, on energy-aware scheduling of embedded systems in real time. In uniprocessor environments, there are several research papers in the domain of energy-aware scheduling of independent real-time tasks, and an extensive survey can be found in [1]. Very little research has discussed the problem of dependent real-time tasks within the context of task synchronization [13,14]. The DVFS mechanism that works to slow the processing speed is a widely used energy-saving technique due to the convexity of the power consumption function [15][16][17].
The interest in multiprocessor techniques has increased as a result of the growth in multi-core architectures. The article [18] tackled the problem of energy-aware static partitioning of periodic real-time tasks on asymmetric multiprocessor (multi-core) embedded systems. It formulated the problem according to the platform-supported DVFS model and outlined optimal methods of reference partitioning for each case of the DVFS model. The authors of [19], from the perspective of allocating workloads to cores, suggested a method for energy administration of applications in a multi-core partitioned architecture. They introduced the Energy Efficient Allocator (EEA) algorithm as an allocation method for assigning partitions to cores founded on bin-packing algorithms that consider the various frequencies at which a core can work. They also presented a variety of solutions to the problem of energy minimization. Every solution will provide an appropriate allocation of workload to cores with various levels of energy and system utilization. The EEA algorithm picks out the type of allocator (First Fit Decreasing Utilization (FFDU) WFDU, and BFDU) and the criteria (decreasing utilization (DU), increasing utilization (IU), or randomly (R)) under which partitions are chosen to minimize their frequency.
For hard real-time systems, the authors of the article [20] presented a study of energyaware multi-core scheduling algorithms. They summed up several algorithms listed in the literature and grouped them by both homogeneous and heterogeneous multi-core processors, depending on Partitioned, Semi-Partitioned, and Global scheduling strategies. An Inter-task Affinity-aware Task Allocation (IATA) algorithm was proposed in [21] to nullify overheads in the WCET due to cache evictions. IATA collects the tasks considering their constraints, dependencies, preferences (shared resources, inter-core communication, and cache evictions) and assigns these groups to multiple cores to decrease the additive overheads in WCET.
A static mixed task scheduling (SMTS) algorithm has been proposed in [22] to solve the problem of scheduling mixed tasks that comprise of n hard real-time periodic tasks with shared resources and soft aperiodic tasks. They take into account two opposing objectives: decreasing the energy consumption and reducing aperiodic task response time. The SMTS algorithm schedules aperiodic tasks with the maximum processor speed and periodic tasks with the best speed. They have also introduced a dynamic mixed task scheduling algorithm (DMTS) capable of reclaiming dynamic slack time produced from periodic tasks and the constant bandwidth server to minimize energy consumption. Their results display that the DMTS technique outperforms the SMTS algorithm and the baseline algorithm, where DMTS decreases an average of 7.18% of energy consumption and 53.66% of response time compared with the other algorithms.
The authors of [23] suggested research on the maximum gains for volunteer computing platforms (VCPs). VCPs can be considered asymmetric multiprocessing systems (AMSs). The authors needed to pick tasks from users and assign the tasks to appropriate workers to solve the maximum benefit problem. They proposed a list-based task assignment (LTA) strategy and showed that the LTA strategy could complete the task with a deadline restriction as soon as possible. Then, based on the LTA technique, they proposed a maximum benefit scheduling (MBS) algorithm, a new task assignment algorithm aimed at optimizing VCP gains.
The authors of [24] implemented a comparison of 11 heuristics for mapping independent tasks on heterogeneous distributed computing systems. It has been shown that the relatively simple Min-min heuristic achieves minimum energy in comparison with the other strategies for the cases studied. The article [25] showed that the proposed Resource-Oriented Partitioned (ROP) scheduling with a distributed resource sharing strategy would achieve a significant speed-up factor guarantee. The authors of [26] aimed to reduce energy consumption under real-time and reliability constraints. They suggested that a formulation of an Integer Non-Linear Programming (INLP) performs task mapping by jointly addressing task allocation, assignment of task frequency, and duplication of tasks. The original INLP problem was safely converted to an analogous Mixed Integer Linear Programming (MILP) problem to provide an optimal solution. Appointing a real-time task group to the multi-core platform is a bin-packing problem that is understood to be an NP-hard problem in the powerful sense; therefore, finding the best solution in polynomial time is not pragmatic in the generic state. Given the unfavorable nature of the problem, numerous heuristics and their performance analyses were subject to various research papers, such as the First-Fit, Best-Fit, Next-Fit, and Worst-Fit methods [27,28]. A comparison was made for homogeneous multi-core systems and periodic independent tasks between these four well-known heuristics behaviors [29].
Indeed, when the Earliest-Deadline-First scheduling technique was used, the problem had a near resemblance to bin-packing [30,31], and the results/heuristics that can be acquired in this vastly studied field show insights into partitioning-based scheduling. The suggested algorithm in [32] uses the Worst-First strategy to partition the collection of frame-based tasks (with the same period and deadline) and then scales the speed in accordance with the task characteristics in a certain instant. Although the method is represented by a rational approximation factor for optimum scheduling, some unrealistic assumptions were made by the author such as a continuous and infinitive frequency range (s ∈ [0, ∞]) and negligible in idle-state consumption. The problem of appointing a series of periodic real-time tasks in multi-core systems characterized by a single voltage island (where all processors share the same voltage and frequency) was considered in [33]. First, they examined the approximation upper bound for the classical Worst-First heuristic, and then they introduced their technique that overcomes many state-of-the-art limitations.
Resource control policies for single-processor systems are well recognized. The Priority Ceiling Protocol (PCP) [34], in particular, is one of the most attractive suggested protocols for synchronization of resource accesses. It avoids both deadlock and transitive blocking. Stack Resource Policy (SRP) [35,36] was defined as a refinement to PCP for EDF systems that strictly binds priority inversion and permits simple schedulability tests. Each task under SRP is assigned a preemption level that reflects the relative deadlines of the tasks. The shorter the deadline, the higher the preemption level. The authors of [37,38] subsequently developed multiprocessor and distributed versions of PCP. Hence, the protocol was targeted at distributed shared memory systems. There have been some versions of the Multiprocessor Priority Ceiling Protocol (MPCP) that extend PCP to multiprocessor systems and reduce the remote blocking. The authors of [39] extend the research for dynamic PCP. A dynamic priority multiprocessor version of the Priority Ceiling Protocol based upon EDF scheduling (MDPCP) was introduced in [40]. The authors of [11,41] extend SRP to the Multiprocessor Stack Resource Policy (MSRP), the first spin-lock protocol in multiprocessor real-time systems.
Partitioning-based real-time scheduling of multiprocessors finds feasibility as the primary aim. The problem occurs in two different patterns: to decrease the number of processors necessary to assure the feasibility of the task set, or, instead, to find sufficient schedulability (usually, utilization) limits given a fixed multiprocessor platform. In this research, the researchers also take into consideration the energy factor to this problem. Because generic bin-packing heuristics do not regard the blocking time caused by resource requests, they may not be efficient for task sets that have shared resources. To regard this extra blocking, the two well-known multi-core synchronization protocols, MPCP and MSRP, were presented. A partitioning heuristic adapted to the MPCP was introduced [6], a semaphore-based multiprocessor real-time locking protocol. The MSRP, spin-lock protocol [11], was proposed where tasks are busy waiting for shared resources once blocked.
The Similarity-Based Partitioning (SBP) algorithm [42] was presented. It is another partitioning heuristic for MSRP using the same methodology, which uses modern cost heuristics to more precisely classify group splits with low energy consumption. It appoints the tasks which can access the same collection of shared resources to the same core to avoid a number of blockings.

Multi-Core DVFS Processor and Energy Model
Most recent processors allow variable levels for voltage and frequency, and this processor can perform dynamic voltage scaling (DVS) and its speed is proportional to the supply voltage. In the literature, DVS processors are classified as ideal and nonideal. The ideal DVS processor will run at any speed, ranging from the lowest to the highest possible speed, whereas a non-ideal DVS processor possesses only separate speeds.
Recently, multiple DVS processors are non-ideal, whereas ideal DVS processors are for theoretical research purposes only. This study regards a multi-core platform P consisting of a set of z cores, i.e., P = {core 1 , core 2 , . . . , core z }, and it supports h discrete speeds S = {s 1 , s 2 , . . . , s h }, where s 1 < s 2 < . . . < s h . The researchers presume that the platform P supports per-core DVFS capabilities where cores may run at different speeds at the run time.
The processor power model [43] used in this study has been greatly used in the literature [13,44]. The researchers suppose a DVFS-enabled multi-core processor is capable of operating at a variety of separate voltage levels. Commonly, the power exhaustion of a complementary metal oxide semiconductor (CMOS) system is known as dynamic and static power consumption [14]. The dynamic power dominates the total energy consumed by the processor core and the dynamic power dissipation is the most costly and time-consuming part. Therefore, this study is aiming only to detract dynamic power consumption during this study and the static power consumption is neglected [26].
The static power consumption is foremost caused by leakage currents (I leak ), and the static (leakage) power (P leak ) will be defined by: The dynamic power consumption will be displayed as a convex function of the processor speed. The dynamic power consumption for CMOS circuits [45] depends on the processor operating voltage and frequency at speed S and it can be presented by: where C eff is the effective switching capacitance, V dd is the supply voltage, and f is the clock frequency of the processor (speed) that will be declared as: where k is a constant, V dd is the supply voltage, and the threshold voltage V th . To express the power consumption of a specified core i of processor P, the researchers use a function PC i (s) of the selected speed s. If a task keeps a processor throughout the implementation duration of [t 1 , t 2 ], then the energy exhausted by the processor throughout this period is given by: where s i (t) is the speed of the processor at time t.

Task and Resource Models
This study focuses on real-time systems consisting of a periodic task set with n tasks, The arrival time (A i ): the timing when the task is first issued.

•
The period (P i ): the fixed time duration among jobs.

•
The relative deadline (D i ): the maximum appropriate delay for task processing. This research regards well-formed tasks that meet the requirement 0 ≤ C i ≤ D i ≤ T i . Each task t i is a prototype of its instances and every instance can reach for every period T i regularly. Let t i,j represent the jth instance of task t i . Within this research, researchers are concerned about scheduling and synchronizing the dependent real-time tasks. The researchers presume these tasks are periodic, dependent (because of their access to shared resources), and preemptible (only in non-critical sections). Furthermore, they presume that a set of m shared resources (software objects, e.g., data structure, files, data objects, or shared variables) Computers 2021, 10, 10 8 of 21 RS = {rs 1 , rs 2 , . . . , rs m } may be accessed in a mutually exclusive method (simultaneous access is not allowable).
Researchers presume that a semaphore provides access control of shared resources to ensure mutual exclusion amongst competitive tasks. Task requests for shared resource access will happen at any moment during its implementation; a portion of code accessing a shared resource is classified as a critical section under mutual exclusion restrictions. A list that describes the critical sections of a task t i is Z i = < z i,1 , z i,2 , . . . , z i,n >, where z i,j is the jth critical section of t i . This study presumes that the shared resource requests are not nested. Locks are freed in the opposite order in which they were acquired. A task t i may request a shared resource rs ∈ RS several times during its execution but just one job at a time will access a shared resource, i.e., binary semaphore. Real-time locking protocols assist to ensure mutual exclusion. For instance, if a task t i asks for a shared resource rs already locked by another task, it must wait until rs is available. Besides, each shared resource rs can have a ceiling priority Ω, indicating the highest possible priority that it can have. Researchers declare u i as the task utilization and it can be described by The system utilization U tot is equal to n ∑ i=1 u i and the periodic task set is scheduled by the P-EDF policy. According to P-EDF policy, priorities are appointed dynamically and are inversely proportional to the absolute deadlines of the active tasks, and the higher priority tasks are executed first.

Problem Description
Consider a workload set TS of n dependent periodic real-time tasks (dependency because of simultaneous access to shared resources) and a set RS of m shared resources. The idea is how to optimally schedule the TS and synchronize their access of RS on a multi-core processor P that supports the DVFS technique and allows h discrete speeds. The research aimed to find the optimum method of task-to-processor assignment (task partitioning) to minimize the total energy exhaustion of a real-time system. In this case, the tasks allocated to each processor can be feasibly scheduled, and the overall energy consumption of P is minimized (among all feasible task allocations). The problem of optimizing dynamic energy consumption using DVFS on a multi-core platform is an optimization problem, that is, to find feasible scheduling with minimal energy consumption [15,17]. Notice that scheduling is considered feasible if all scheduled task instances can be finished within their deadlines at the latest [33].

Task Scheduling and Synchronization in a Multi-Core Platform
In particular, this study uses P-EDF [3] as the scheduling algorithm and multiprocessor stack resource policy (MSRP) [11] as the synchronization protocol. By using the P-EDF scheduling algorithm, the priority-driven scheduling algorithm, tasks are partitioned offline at first among cores and are then scheduled on the allocated cores. Under MSRP, the resources are divided into two groups: local and global. Local resources are accessed only by tasks that execute on the same processor. Global resources are those which can be accessed by tasks running on different processors. There are two types of blocking: local blocking, which occurs when a task running on one core is blocked by another task running on the same core, and remote blocking, which occurs when this task is blocked by a task that is running on another core. Unlike SRP, global resources have different ceilings-one for each processor. Moreover, every processor has its own system ceiling. On processor P, tasks can only use global resources at the processor ceiling priority, that is, the highest preemption level of all the tasks on processor P. Global resources are shared across processors in a First-In-First-Out (FIFO) manner. To acquire a global resource, a task must be running at the processor ceiling which makes it non-preemptive. Whenever a task tries to access a shared resource that is already locked in the system by another task, the task performs a busy wait (called a spin-lock), and the task resumes when the shared resource is unlocked from the previously locked task.
This study use MSRP to ensure a mutual exclusion among the competing tasks from multiple cores and to maintain the data consistency of shared resources. Under the MSRP, every task has a fixed value, named preemption level λ i of task t i , to estimate the possible blocking in the presence of dynamic priority scheduling. Tasks with a shorter deadline will have a higher preemption level so the levels of preemption will represent the relative deadlines of the tasks. Resources are given a ceiling value during the run-time according to the maximum preemption level of the tasks accessing the resource. Whenever a task is issued, it can only preempt the currently performed task if its absolute deadline is lesser and its degree of preemption is greater than the highest ceiling of currently locked resources. The effect of this protocol is nearly identical to PCP; tasks experience only one blocking, deadlocks are avoided, and a simple formula can be obtained to compute the blocking time. The MSRP lets tasks use the local critical resources under the SRP policy. As a result, SRP saves redundant context switches by blocking earlier [11].

Schedulability Analysis of the MSRP
For a multi-core platform, researchers propose a partitioning algorithm for appointing tasks onto processors; then, the tasks will be scheduled by EDF as a scheduling algorithm and will use MSRP as a synchronization algorithm. When tasks are scheduled to be carried out on a uniprocessor [37], a group of n real-time tasks are schedulable by EDF and SRP if: where B i is the worst-case blocking time of t i . Tasks can access resources in a mutually exclusive technique, and therefore, the overheads due to blocking time must be considered whilst checking the schedulability of tasks assigned to the core. Under MSRP, if a task t i tries to request a global resource rs, it becomes non-preemptive. If the resource rs is free. it locks the resource, but if rs is already locked by another task t j running on a different processor, t i performs busy wait (spinning state). The worst-case blocking time B glob i can be calculated by considering busy wait time as follows: B glob i = max z (t j ,rs) + spin(P c , rs) where t j is not on P c ∧ (rs is global) where spin(P c , rs) is the upper bound of busy wait time that any task can wait on processor P c to access a global resource rs, which can be expressed as follows: spin(P c , rs) = ∑ ∀ P l = P c max Z tj, rs ∀ t j On P l where Z tj, rs refers to the length of any critical section of task t j requesting to access the resource rs. B local i is considered to be the worst-case blocking time of task t i when accessing a local resource. By using the synchronization protocol MSRP, B local i can be calculated as follows: where λ i is preemption level of task t i , and ceil(rs) is the ceiling of local resource rs which is the highest preemption level of all the tasks that may access rs in core P c .
The worst-case blocking time B i of task t i executing on processor P c is calculated as follows: Based on the schedulability analysis of multiprocessor environments [11], a set of n real-time tasks on processor P k , ordered by decreasing preemption level, is schedulable under EDF and MSRP if: The proposed algorithm aims to reduce the overall blocking overhead in the system that may excess the schedulability of a task set.

Proposed Approach for Task Partitioning
This section introduces the proposed BABP algorithm for task allocation on a homogeneous uniform multi-core platform. Figure 1 presents the general idea of the proposed algorithm. The algorithm BABP can be used under partitioned RMS (Rate-Monotonic Scheduling) and partitioned EDF scheduling schemes along with MSRP. The algorithm uses the uni-core synchronization protocol SRP when dependent tasks are assigned to the same core of the processor. The BABP algorithm aims to partition the periodic realtime tasks amongst the cores to (1) decrease the overall remote blocking times of tasks due to shared resources, (2) balance a load of multiple cores, and (3) reduce inter-core communication. This usually improves the schedulability of a task set. Considering the blocking factors of tasks under MSRP, more blocking times occur by tasks with additional and extended global critical sections [11]. Algorithm 1 lists the specifics of the proposed BABP algorithm.

Blocking-Aware-Based Partitioning (BABP) Algorithm
In the initial setup, BABP calculates each task utilization eachU i = C i P i , for each task (line 5) and then calculates the num of z i,q parameter which is the number of critical sections where t i asks for resource rs q and the longest z i,q parameter which signifies the longest critical section of t i demanding rs q (lines 7-14). The BABP algorithm uses the previously computed parameters (in the initial step) as inputs to compute a weight for each task t i _w by using (11). The proposed partitioning technique aims to minimize the blocking times, so higher weights should be given to the tasks which can lead to longer blocking times. Therefore, the calculation of the weight of task t i depends on its utilization in addition to the number of its critical sections multiplied by the duration of its longest critical sections for all resources that it will access as follows: The BABP algorithm generates a resource usage table (lines 16-21) (sorts the tasks in non-increasing order based on the preemption value of the task λ i ) to determine the maximum blocking time for a task t i [37,38], denoted by local blocking time calculated by (8) or global blocking time calculated by (6), depending on the partitioning strategy. Then, it orders the tasks according to their weights in a non-increasingly order. Depending on the partitioning strategy, the calculated task weight t i _w signifies the significance of the task.
The BABP algorithm picks the task pairs, starting with the first task (that has the maximum weight). Then, it calculates the proposed cost function V ij,q by (12), which is a function that calculates the number of critical sections and the duration of the longest critical sections for each task pair (t i and t j ) that shares a similar resource q: Then, the algorithm arranges tasks in non-increasing order based on their cost function V, grouped by shared resources (lines [23][24][25][26][27][28][29][30][31][32]. The output of BABP is the task set where  Calculate eachU i = C i P i ; // task t i utilization 6. assigned(t i ) = false; 7.
foreach rs q t i resources 8.
foreach z t i _critical sections set 9.
calculate num of z i,q ; // count number of times // the rs q used by t i 10.
calculate longest z i,q ; // longest critical section // of rs q with t i 11. total = (num of z i,q × longest z i,q ); // total number of // critical sections z q of resource rs q multiply by its longest critical //section for task t i 12.
end for 13. sum + = total; // summation of total value for all resources //that shared by task t i 14. for j = 1 to n 25.
Pick t i and t j from the top of the ordered list based on t i _w;
if (t i _resource set ∩ t j _resource set = {∅ }) then // check t i and t j have shared resources 28.
Calculate cost fun V ij,q of tasks i, j for each resource in Res q ; // by (6) 30.
end if 31. end for 32. end for 33. Sort tasks in decreasing order based on V ij,q , Grouping by shared resource_id; return: List of Tasks, allTs, sorted based on the cost function value End

Task Allocation Algorithm
In this step, the Task Allocation algorithm picks the tasks, starting with the first task (which has the highest cost function V value), checks the task assigned(t i ) value if false (line 5), and then tests the schedulability condition defined by (13). It allocates the tasks that directly or indirectly share resources to the same core based on the value of the cost function. For example, if the tasks t i and t j share resource R q and tasks t j and t k share the same resource R q , all three tasks will allocate to the same processor if the schedulability test is satisfied. If not, the algorithm allocates the task pairs with the maximum value of the cost function V of this shared resource to the same core at first.
If task t i satisfies the schedulability condition for core g and the task assignment is completed, the Task Allocation algorithm updates the assigned(t i ) value to true for task t i and updates core g utilization considering all previous tasks' utilization allocated to that core g (lines 8-11): where B i is the worst-case blocking time of t k .
After the Task Allocation algorithm is finished, the task set TS is partitioned into {T 1 , T 2 , . . . , T h }, and each partition is assigned to the proper core g for g = 1 to h. Every core takes advantage of EDF to schedule its allocated tasks. Algorithm 2 lists the specifics of the proposed Task Allocation algorithm. Figure 3 shows the flowchart of the proposed Task Allocation algorithm.

Experimental Evaluation and Analysis
The experimental assessment of the BABP algorithm was performed on a simulator named the multi-core real-time scheduling simulator, MCRTsim [12]. A realistic environment was used in the experiments from Marvell's XScale technology-based processor PXA270 [43] and a non-ideal DVS platform. The PXA270 processor provides six voltage 2. a multi-core processor P = {core 1 , core 2 , . . . ., core h } Begin: 3. while (allTs.size > 0) do 4. Pick the task t i from the top of the ordered list based on V ij,q ; 5. if assigned(t i ) = true then 6. for g = 1 to h // h cores // Test schedulability condition by (10) 7.
if task t i satisfy the schedulability condition on core g then 8.
allocate t i to core g ; 9.
end for 15.

Experimental Evaluation and Analysis
The experimental assessment of the BABP algorithm was performed on a simulator named the multi-core real-time scheduling simulator, MCRTsim [12]. A realistic environment was used in the experiments from Marvell's XScale technology-based processor PXA270 [43] and a non-ideal DVS platform. The PXA270 processor provides six voltage frequency levels, indicated in Table 1. Through the simulator MCRTsim, the researchers set up a dual-core processor with the setting of its available processor speeds. The primary performance measure of interest in the experiments was the energy consumption of tasks, named Energy_Consum. Assuming that s(t) is the speed of the processor in time t, the energy consumption Energy_Consum can be determined by simTime 0 PC(s(t))dt, where simTime is the time of the simulation. Once the tasks are assigned permanently to the processors, a speed assignment scheme is chosen to reduce the energy consumption while preserving feasibility. The Dual-Speed (DS) algorithm, a Two-Speed Strategy (TSS)-based technique, is initially used to execute tasks at a low-speed level and then switches to a high speed instantly when the tasks are blocked. When using the P-EDF algorithm to schedule tasks, DS adjusts the low-speed level and the high speed based on the EDF-sufficient schedulability condition [3].

Simulation Settings
Different workloads, randomly generated task sets, were used. Task period values were uniformly generated to obtain short tasks (10~50) ms, medium tasks (50~100) ms, and long tasks (100~500) ms. The worst-case computation time of tasks in the three groups was (1~10), (1~20), and (1~100) ms, respectively. The task period and the worstcase computation amount were picked out randomly from the respective ranges for each workload. Every task set was composed of 5-20 tasks. Within this research, the number of shared resources was adjusted to be around 4 and 6 to ensure sufficient competition between tasks. The number of resources which a task accesses was picked at random from 1 to 4. The duration and position of the critical sections within every task were chosen randomly. Remember that the utilization bound for EDF is where n is the number of tasks. Table 2 introduces the parameters for a set of 10 tasks with a short period.

Experimental Results and Discussion
When the randomly generated task sets were partitioned by BABP, SBP [42], and classic bin-packing heuristics-BFD and WFD [30,31]-they were scheduled and synchronized by P-EDF and MSRP in MCRTsim. These generated feasible dynamic priority task sets were assessed under various values of the utilization factor. The BABP algorithm was evaluated under the partitioned EDF scheduling scheme along with the MSRP multi-core shared resources synchronization protocol. Regarding energy consumption, the results of BABP are compared with the following heuristic algorithms' results: SBP, BFD, and WFD. Figure 4 shows a graphical representation of the simulation results, such that the scheduling results of tasks' executions and the resource usages can be observed on different cores.     Table 3 indicates the simulation results, energy consumption, for scheduling a set of 10 tasks with a short period. The results show that the blocking time caused by global shared resources is minimized significantly by allocating tasks using BABP as compared to other mentioned task allocation techniques. When a bin-packing algorithm allocates a task to a bin, it usually allocates the task in a bin that fits it better, and it does not consider the unallocated objects that will be allocated after the current object. The BABP approach gathers all tasks with a higher value of the cost function V i j,q and allocates them on the same core, which reduces the remote blocking time of the task due to shared resources, minimizes the inter-core communication, and exploits the parallelism of the multi-core architectures efficiently. These time-saving factors result in reducing the total processor busy time, which is consumed to schedule additional tasks. Considering the effect of variation in the number of tasks per task set, a simulation was performed by doubling, tripling, and quadrupling the number of tasks. Consider Figure 5 which plots the overall energy consumption of the system (the vertical axis) that the algorithms could schedule successfully versus the number of tasks 5, 10, 15, and 20 of the task set (the horizontal axis). Figure 5 indicates the energy consumption varied according to the number of tasks and the period of the task. Part (a) shows task sets with a short period (10~50) ms. Part (b) shows task sets with a medium period (50~100) ms. Part (c) shows task sets with a long period (100~500) ms. The results display that the increase in the number of tasks will increase energy consumption in all situations and the task set with fewer tasks will outperform the task set with more tasks. As a result of the competition among tasks for resources, the number of blockings and their time will increase as the number of tasks grows. Figure 5 illustrates that the BABP algorithm performs significantly better than all other compared partitioning algorithms. The BABP algorithm can minimize the amount of remote blocking by partitioning tasks based on their resource usage likeness (directly or indirectly shared) and the longest blocking time. The BABP algorithm allocates the tasks that directly or indirectly share resources onto the same processor based on the importance of the cost function. Hence, BABP succeeds in minimizing the length of remote blocking. However, the SBP algorithm performs better than the blocking-agnostic algorithm in some situations only. The results indicate that in some cases, the increase in the task period contributes to more energy savings. They illustrate that task sets with a long period, even with a heavy workload, will profit from the proposed BABP algorithm to improve the system performance and save more energy.
Considering the effect of variation in total utilization of tasks, numerous task sets have been randomly generated, with total utilization ranging from 0.4 to 1.0. Figure 6 plots the overall energy consumption of the system (the vertical axis) that the algorithms could schedule successfully versus the total utilization of the task set (the horizontal axis). It demonstrates the comparison of the results of the BABP, SBP, BFD, and WFD algorithms. These outcomes indicate that the variation in energy consumption for a given utilization is increased with increasing the utilization of tasks in some situations, as it is observed that BABP outperforms all other techniques in all situations.
have been randomly generated, with total utilization ranging from 0.4 to 1.0. Figure 6 plo the overall energy consumption of the system (the vertical axis) that the algorithms coul schedule successfully versus the total utilization of the task set (the horizontal axis). demonstrates the comparison of the results of the BABP, SBP, BFD, and WFD algorithm These outcomes indicate that the variation in energy consumption for a given utilizatio is increased with increasing the utilization of tasks in some situations, as it is observe that BABP outperforms all other techniques in all situations.
(a) Task sets with a short period.
(b) Task sets with a medium period.

Conclusions
Obtaining better performance and meeting the hard deadlines of real-time very critical problem. This research examines the problem of curtailment ene

Conclusions
Obtaining better performance and meeting the hard deadlines of real-time tasks is a very critical problem. This research examines the problem of curtailment energy consumption for a dependent periodic real-time task set that share resources. The proposed BABP algorithm assigns a task set to the processor of a single-chip multiprocessor (multicore) with shared memory. The proposed technique's goal is to minimize task blocking times by allocating tasks that share resources directly or indirectly to appropriate processors, beginning with tasks with the maximum estimated blocking time. Generally, this increases the schedulability of a task group and may result in fewer processors needed compared to blocking-agnostic bin-packing strategies. BABP supports the parallelism between tasks which do not have shared resources efficiently. The method can exploit the parallelism of the multi-core architectures efficiently. In this method, the tasks in one application can run on a different processing core in parallel.
Because so many systems use dynamic priority scheduling protocols in practice, the researchers implemented the proposed algorithm under MSRP, a standard synchronization protocol for multiprocessors (multi-cores) that operates under the dynamic priority scheduling algorithm P-EDF. The proposed technique performs notably better than the other conventional task allocation algorithms BFD, WFD, and SBP. The results reflect that the proposed algorithm reduces the globally shared resources and inter-core communication. The proposed partitioning algorithm will greatly reduce blocking times, improve overall system performance, and reduce energy consumption. The proposed approach abilities were evaluated using the MCRTsim simulator. For future work, the researchers will also concentrate on run-time task partitioning strategies for global and semi-partitioned schemes with various evolving synchronization protocols.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author and will be made publicly available at a later time.

Conflicts of Interest:
The authors declare no conflict of interest.