Next Article in Journal
Wearable AR System for Real-Time Pedestrian Conflict Alerts Using Live Roadside Data
Previous Article in Journal
FPGA-Based Manchester Decoder for IEEE 802.15.7 Visible Light Communications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Tetris-Based Task Allocation Strategy for Real-Time Operating Systems

1
National Key Laboratory of Wireless Communications, University of Electronic Science and Technology of China, Chengdu 611731, China
2
School of Networks & Communication Engineering, Chengdu Technological University, Chengdu 611730, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(1), 98; https://doi.org/10.3390/electronics14010098
Submission received: 4 December 2024 / Revised: 24 December 2024 / Accepted: 27 December 2024 / Published: 29 December 2024
(This article belongs to the Section Computer Science & Engineering)

Abstract

:
Real-time constrained multiprocessor systems have been widely applied across various domains. In this paper, we focus on the scheduling algorithm for directed acyclic graph (DAG) tasks under partitioned scheduling on multiprocessor systems. Effective real-time task scheduling algorithms significantly enhance the performance and stability of multiprocessor systems. Traditional real-time task scheduling algorithms commonly rely on a single-heuristic parameter as the reference for task allocation, which typically results in suboptimal performance. Inspired by the Tetris algorithm, we propose a novel heuristic scheduling algorithm, named Tetris game scoring scheduling algorithm (TGSSA), that integrates multiple-heuristic parameters. The process of real-time DAG task scheduling on a multiprocessor system is modeled as a Tetris game. Through simulations of the worst-case response time (WCRT) analysis and observed average response times in RT-Linux, which is a frequently-used real-time operating system, our algorithm demonstrates superior performance, effectively improving the efficiency and stability of real-time operating systems.

1. Introduction

Real-time operating systems (RTOSs) are increasingly utilized in embedded systems across various domains, including edge computing, industrial automation, vehicle control systems, aerospace, medical devices, and the Internet of Things (IoT) [1,2,3,4,5]. These rapidly evolving technologies demand significant computational power, leading to the widespread deployment of RTOSs on multiprocessor platforms [6,7,8]. As technology continues to advance, the requirements for real-time performance and system reliability are becoming more stringent [9]. Task scheduling, a core function of RTOSs, has emerged as a critical area of research. Efficient task scheduling impacts system performance, response time, and the safety and stability of applications [10]. However, compared to single-processor systems, implementing real-time task scheduling on multiprocessor platforms presents greater complexity.
The primary goal of task scheduling in real-time systems is to ensure that critical tasks meet their deadlines, satisfying real-time constraints [6]. Depending on the scheduling strategy, scheduling methods can be categorized into static and dynamic approaches. Static scheduling determines the execution order of tasks during the design phase of the system, making it suitable for environments where task characteristics are well defined and remain stable [10,11]. Conversely, dynamic scheduling assigns tasks based on runtime conditions, offering greater flexibility and adaptability [4]. This approach is particularly effective in handling varying task loads and real-time priority adjustments.
Real-time tasks are characterized by stringent timing requirements. In hard real-time systems, missing a task deadline can have severe repercussions, such as jeopardizing aviation safety in flight control systems [6]. In soft real-time systems, while task deadline violations may not result in system failure, they can significantly degrade system performance [5]. Therefore, the primary objectives of task scheduling are to ensure adherence to timing constraints, optimize resource utilization, and maintain system stability and efficiency.
With the rise of multicore processors and cloud computing in recent years, task scheduling has encountered both new challenges and opportunities [3]. While multicore architectures enhance resource utilization, they also introduce issues such as task synchronization and competition for resources. Simultaneously, the growing adoption of edge computing and IoT has increased the need for scalable and efficient scheduling algorithms [12]. To address these challenges, researchers have developed innovative scheduling strategies, including priority-based scheduling, hybrid scheduling, and distributed scheduling. Moreover, there is a growing emphasis on the security and predictability of real-time task scheduling. As real-time systems are increasingly deployed in critical applications, ensuring their robustness and reliability under extreme conditions has become a key design consideration. Thus, modern research not only focuses on optimizing task completion rates and system responsiveness but also on enhancing system reliability and stability [12].
Traditional real-time scheduling theories have primarily focused on single-processor systems. However, with the advent of multiprocessor and multicore systems, the limitations of single-processor scheduling models have become apparent. Task scheduling in multicore processors is far more complex, requiring consideration of factors such as task allocation across processors, load balancing, task migration, and inter-task communication and synchronization. To address these challenges, researchers have proposed global, partitioned, and hybrid scheduling strategies that adapt traditional algorithms for multiprocessor environments. These approaches aim to achieve efficient resource utilization and flexible task distribution, meeting the evolving demands of modern real-time systems.
In real-time multiprocessor systems, most scheduling algorithms are based on single-heuristic parameters to allocate tasks, such as completion time, processor utilization, schedulability, etc. [12]. Different from previous research on real-time scheduling algorithms for multiprocessors, we draw inspiration from the Tetris algorithm and propose a new scheduling algorithm (TGSSA) with multiple-heuristic parameters through scoring the states of the scheduling progress. The main contributions can be summarized as follows:
  • We innovatively abstract the task scheduling process into a Tetris game and propose a reliable task scheduling algorithm that can comprehensively consider multiple-heuristic parameters, by extending El Tetris, a classic Tetris playing algorithm, and combining it with task scheduling. Most existing algorithms are only able to consider a single-heuristic parameter.
  • We not only conduct WCRT analysis on the results of the scheduling algorithm, but also conduct actual real-time task tests on Ubuntu 22.04 with real-time patch on AMD Ryzen 7 4800U.
The remainder of this paper is organized as follows. Section 2 introduces related work. Section 3 describes the system model and WCRT analysis method used in this paper. Section 4 and Section 5 introduce the new task allocation strategy. Section 6 reports experimental evaluations. Section 7 concludes this paper and discusses possible future work.

2. Related Work

In real-time task scheduling problems, tasks are often periodic, meaning that each task executes at regular intervals determined by its period. A crucial aspect of this model is determining the priority relationships among tasks to ensure proper scheduling.
Rate monotonic scheduling (RMS) is a widely used fixed-priority scheduling algorithm for periodic real-time tasks, especially in single-processor systems [13]. The key feature of RMS is that task priority is directly related to its period (or frequency): tasks with shorter periods are assigned higher priorities. RMS is a static priority scheduling method, operating under the assumption that the execution time of each task is known and all tasks are periodic. During the design phase, priorities are assigned to tasks and remain fixed throughout system operation. In practice, RMS operates efficiently by allowing the scheduler to interrupt the currently running task if a higher-priority task becomes ready. If the new priority of task is lower or equal, the current task continues execution.
In addition to RMS, other scheduling algorithms like earliest deadline first (EDF) and deadline monotonic scheduling (DMS) are also prevalent [7,13]. The EDF is a dynamic priority algorithm that, in theory, achieves up to 100% processor utilization on a single processor, outperforming RMS’s 69.3% utilization limit. However, RMS offers simpler implementation and greater predictability, making it a preferred choice in embedded systems with steady resource requirements. The EDF is more suited for environments with significant load fluctuations. The DMS, similar to RMS, assigns task priorities based on deadlines instead of periods. The RMS is best suited for systems where task periods are equal to deadlines, while DMS handles cases where these differ. As one of the most critical fixed-priority scheduling algorithms, RMS is simple, efficient, and optimally suited for fixed-priority systems, making it a trusted choice in many industrial and embedded applications. TGSSA is used for task partitioning on multiple processors, while RMS, DMS, and EDF are used for priority allocation on a single processor. TGSSA can increase the number of subtasks that can run at the same time, while single processor scheduling algorithms determine preemption on a single processor. They do not conflict with each other, and work together.
There are multiple ways to model task allocation, such as directed acyclic graph (DAG) [11], synchronous parallel task model [14], sporadic-based models [15], gang scheduling model [16], and fork-join task model [17]. For multiprocessor systems, task allocation strategies often employ heuristic algorithms to address DAG task allocation challenges. Casini et al. introduced a heuristic algorithm called Schedulability Testing Priority Assignment (STPA), which uses breadth-first search to allocate subtasks in a DAG across processors [18]. The STPA performs schedulability tests before assigning each subtask to ensure feasibility. If a subtask fails these tests across all processors, the algorithm deems the task set unschedulable. However, this method incurs high computational complexity and often struggles with low schedulability rates. Özkaya et al. proposed a clustering-based approach (BLM), but its effectiveness was limited in general scenarios [19]. Aromolo et al. categorized DAG tasks into heavy and light types, assigning heavy-task subtasks across all processors while restricting light-task subtasks to a single processor. While straightforward, this method can lead to underutilization of resources [12].
To improve efficiency, some researchers have adopted the principles of list scheduling algorithms, assigning priorities to subtasks within DAG tasks to facilitate allocation. He et al. introduced a list-based priority scheme for DAG subtasks, aiming to resolve parallelization conflicts through priority-based scheduling [12]. However, simple heuristic strategies in such methods often fail to deliver optimal results. Inspired by the multiheuristic parameter optimization techniques used in the Tetris algorithm (El Tetris), we propose a novel multiple-heuristic parameter real-time scheduling algorithm [20]. This approach aims to enhance task scheduling efficiency and improve overall system performance.

3. System Model

3.1. Task Execution Model

In task scheduling models, individual tasks are typically abstracted as DAGs [21,22]. In the system under study, a set of N DAG tasks, denoted as Γ = { τ 1 , τ 2 , , τ N } , is scheduled on a real-time homogeneous computing platform. This platform consists of m homogeneous processors, denoted as p 1 , p 2 , , p m . Each DAG task can be represented as τ i = ( V i , E i , T i , D i , π i ) , where V i is the set of subtasks (nodes), E i the edges, T i the arrival period, D i the relative deadline, and π i the fixed priority.
The set of subtasks of DAG τ i is represented as V i = V i , 1 , V i , 2 , , V i , | V i | , where V i , j denotes the j-th subtask of τ i . Each subtask is defined by its worst-case execution time (WCET) C i , j and the processor P i , j to which it is assigned. Specifically, V i , j = C i , j , P i , j , where C i , j represents the WCET of the subtask and P i , j represents the processor on which the subtask is assigned to run.
An edge e ( V i , a , V i , b ) represents a dependency from subtask V i , a to subtask V i , b . e ( V i , a , V i , b ) means that subtask V i , b can only begin execution after V i , a finishes execution. In this case, V i , a is considered the predecessor of V i , b , and V i , b is considered the successor of V i , a . We use P ( V i , j ) and S ( V i , j ) to represent the sets of predecessors and successors of subtask V i , j , respectively [11]. For example, in the case of V i , a and V i , b mentioned above, V i , a P ( V i , b ) and V i , b S ( V i , a ) .
If a subtask has no predecessors, i.e., P ( V i , j ) = , it is called an entry subtask. If τ i has multiple entry subtasks, a virtual entry subtask with a WCET of 0 is added to the DAG, along with edges from the virtual entry subtask to each of the entry subtasks. Similarly, if a subtask has no successors, i.e., S ( V i , j ) = , it is called an exit subtask. If τ i has multiple exit subtasks, a virtual exit subtask with a WCET of 0 is added to the DAG, along with edges from the exit subtasks to the virtual exit subtask [11].
In this paper, we simplify by assuming that D i = T i . To determine the period T i , we introduce the utilization U i of DAG τ i and the total WCET C s u m i = j = 1 | V i | C i , j , so that T i = C s u m i / U i . Further, by using U to represent the utilization of the entire set Γ , it can be calculated as U = i = 1 N U i .
The priority π i of each task τ i is fixed and determined by RMS in this paper. The detailed procedure for assigning priorities to the tasks in Γ is presented in Section 2. Once the priorities are assigned, they remain fixed during execution. Additionally, due to the nature of RTOSs, a higher-priority subtask will preempt a lower-priority subtask if the higher-priority subtask is ready to execute. All subtasks of the same task τ i share the same priority π i , so no preemption occurs between subtasks of the same DAG task.
Figure 1 shows a DAG task τ i with C s u m i = 95 and T i = D i = 100 . V i , 1 is an entry subtask. V i , 3 , V i , 7 , and V i , 8 are exit subtasks. Taking V i , 2 as an example, < 15 , 2 > means that the subtask has a WCET of 15, and P i , 2 = 2 . Then, we provide five definitions that will be used in the following algorithms.
First, a path λ i , k is a sequence { V e n t r y V i , s 1 V i , s V e x i t } of subtasks in DAG task τ i . Any two adjacent subtasks in a sequence exist on adirected edge in E i , i.e., e ( V i , s 1 , V i , s ) E i . At least one path exists in a DAG task, and | λ i | represents the number of paths in τ i .
Second, we define a processors set as p r ( B ) , where subtasks in set B will be executed on it. For example, the processors set p r ( τ i ) denotes subtasks in τ i that will be executed on it.
Thrid, if the following two conditions are satisfied, the subtask V i , s is the indirect father subtask of V i , t , and the subtask V i , t is the indirect child subtask of V i , s :
  • e ( V i , s , V i , t ) is not included in the directed edges set E i of τ i .
  • At least one directed path in τ i connects V i , s and V i , t .
Fourth, l e n ( λ i , k ) denotes the WCET of the k-th path in λ i .
Last, if V i , j and V i , k are not connected by any directed path, V i , j and V i , k is called a nondirect topological relationship, referred to as an NDT relationship.

3.2. RTA Analysis Strategy

In this subsection, we introduce a traditional WCRT analysis strategy, “response time algorithm” (RTA). For τ i , the WCRT is equal to the maximum WCRT within all directed paths and can be calculated by Equation (1) [23]:
R ( τ i ) = m a x k [ 1 , | λ i | ] { R ( λ i , k ) } .
As shown in Equation (1), R ( λ i , k ) represents the directed path λ i , k ’s WCRT. It can be calculated using Equation (2) [23]:
R ( λ i , k ) = l e n ( λ i , k ) + I h i g h ( λ i , k ) + I i ( λ i , k ) .
In Equation (2), R ( λ i , k ) contains three parts. The first part l e n ( λ i , k ) is the WCET of this path. The second part, I h i g h ( λ i , k ) , is high-priority interference caused by tasks h p ( τ i ) . The third part I i ( λ i , k ) is self-interference, which is caused by the same subtasks in the DAG task τ i but not in the path λ i , k .
In schedulability test, a DAG task τ i is schedulable if its WCRT is less than or equal to the deadline D i . Furthermore, a set of DAG tasks Γ is schedulable if all DAG tasks are schedulable. In practical scenarios, we are unable to obtain exact self-interference I i ( λ i , k ) and high-priority interference I h i g h ( λ i , k ) before execution. Instead, we can explore R ( λ i , k ) ’s upper bound R u b ( λ i , k ) . The first part l e n ( λ i , k ) represents the WCET of λ i , k , which can be exactly calculated. Therefore, we can analyze the self-interference upper bound of I i ( λ i , k ) and the high-priority interference upper bound of I h i g h ( λ i , k ) separately to obtain R u b ( λ i , k ) .

3.2.1. High-Priority Interference

Among the subtasks of tasks with higher priority than τ i , only subtasks assigned to the same processor in p r ( λ i , k ) can generate high-priority interference on λ i , k . The total workload Q j ( λ i , k ) for τ j in processors p r ( λ i , k ) can be represented by the following Equation (3) [23]:
Q j ( λ i , k ) = p p r ( λ i , k ) p r ( τ j ) P j , t = p C j , t .
The upper bound of high-priority interference I j u b ( λ i , k ) generated by τ j can be calculated by following Equation (4) [23]:
I j u b ( λ i , k ) = R ( λ i , k ) + J j T j Q j ( λ i , k ) .
In Equation (4), J j can be calculated by the following Equation (5) [23]:
J j = D j m i n p p r ( λ i , k ) p r ( τ j ) p j , t = p C j , t .

3.2.2. Self-Interference

All subtasks in the same DAG task have the equivalent priority and can not preempt each other. For V i , a not in λ i , k and V i , b in λ i , k , the subtask V i , a is a self-interference subtask of V i , b if the subsequent two conditions are satisfied:
  • V i , a and V i , b are executed on the same processor.
  • V i , a and V i , b are in an NDT relationship.
s e l f ( V i , t ) represents the subtasks that will result in self-interference on V i , t . Therefore, the subtasks that will result in self-interference on λ i , k can be represented by the following Equation (6) [23]:
s e l f ( λ i , k ) = V i , t λ i , k s e l f ( V i , t ) .
Therefore, the upper bound of the self-interference can be calculated by Equation (7) [23]:
I i u b ( λ i , k ) = V i , t s e l f ( λ i , k ) C i , t .
Based on Equations (2), (4) and (7), the WCRT of λ i , k can be calculated by the following Equation (8) [23]:
R u b ( λ i , k ) = l e n ( λ i , k ) + V i , t s e l f ( λ i , k ) C i , t + τ j h p ( τ i ) R u b ( λ i , k ) + J j T j Q j ( λ i , k ) .
In summary, we calculated the WCET, self-interference upper bound, and high-priority interference upper bound of the path λ i , k in the DAG, and combined these three upper bound analyses to obtain a formula for the WCRT upper bound of the path λ i , k . R u b ( λ i , k ) exists on both sides of Equation (8), so the WCRT analysis result of the path can be obtained by iterative solution. The solution of Equation (8) represents the WCRT of directed path λ i , k . Finally, we calculate the WCRT of all paths and find the maximum value to obtain the WCRT of the DAG task τ i .

4. Tetris-Based Allocation Model

Task partitioning and scheduling typically aim to minimize the WCRT of tasks to optimize system performance. When the number of processors is fixed at m, WCRT can be approximated by maximizing the parallelism between subtasks, subject to system constraints. In other words, the goal is to maximize the number of subtasks executed in parallel at any given time. This can be equated to the idea that, at any point in time, the more subtasks that are executed concurrently, the better. If at a given moment, m subtasks are being executed simultaneously, then the parallelism of the task at that moment has reached its upper bound. That is, even if additional subtasks are available for parallel execution, there will no longer be sufficient processors to accommodate them. Therefore, the ideal scenario we seek is to achieve this maximum level of parallelism as often as possible, ensuring that, at each moment, as many as m processors are actively executing tasks.
We construct a coordinate system with processor indexes on the horizontal axis and time t on the vertical axis. If a subtask V i , j is assigned to processor P i , j , and executes from time t 1 to time t 2 , the corresponding grid cells in the coordinate system at P i , j are marked on the horizontal axis, and between t 1 and t 2 on the vertical axis. In this context, the objective of ensuring that as many processors as possible are executing tasks at any given moment can be interpreted as filling as many cells as possible along a given vertical coordinate (time) within a particular column (processor).
Furthermore, each subtask can be considered as a block with a width of 1 and a height of C i , j . The objective is to place each block into the appropriate column (processor) and row (start time) in such a way that as many rows as possible are filled.
The above process closely resembles the well-known game Tetris. In the exploration of the Tetris gameplay algorithm, Pierre Delacherie, Islam El-Ashi [20], and others introduced several key concepts related to the Tetris board.
  • Landing height: The height where the piece is placed, which equals the height of the column plus half the height of the piece.
  • Rows eliminated: The number of rows eliminated after the last piece is placed.
  • Row transitions: The total number of row transitions. A row transition occurs when an empty cell is adjacent to a filled cell on the same row and vice versa.
  • Column transitions: The total number of column transitions. A column transition occurs when an empty cell is adjacent to a filled cell on the same column and vice versa.
  • Number of holes: A hole is an empty cell that has at least one filled cell above it in the same column.
  • Well sums: A well is a succession of empty cells such that their left cells and right cells are both filled.
Theorem 1.
In the context of aperiodic tasks, higher processor utilization, more parallel execution time, and a higher number of removal rows in Tetris are all equivalent.
Proof. 
The WCET is fixed for a DAG task set, i.e., C s u m = i = 1 N C s u m i = i = 1 N j = 1 | V i | C i , j is fixed. From a timescale perspective, supposing the task set starts at t = 0 and ends at t = W C R T , the parallel execution time at the k-th tick is denoted as p a r k = m ( t h e n u m b e r o f i d l e p r o c e s s o r s ) . In other words, C s u m = k = 0 W C R T p a r k . In this equation, C s u m is fixed, so larger p a r k s means smaller WCRT. The removal rows in Tetris correspond to p a r k = m ; that is, there is no idle processor existing at this tick. Therefore, higher processor utilization, more parallel execution time, and a higher number of removal rows in Tetris are all equivalent.  □
It is evident that preemption caused by higher-priority tasks in periodic tasks does not alter the conclusions presented in Theorem 1.
Theorem 2.
The concepts of landing height, rows eliminated, row transitions, column transitions, holes, and wells are present in task scheduling problems, serving functions analogous to those in the Tetris game.
Proof. 
Landing height: In the Tetris game, when other factors are equal, players tend to allow a piece to land at a lower position. Similarly, in task scheduling, each subtask has a start time. When other factors are constant, users prefer to start the subtask as early as possible.
Rows eliminated: A key objective in Tetris is to eliminate as many rows as possible. In a multiprocessor system, when each processor is executing a task, the system utilization is maximized at that moment.
Row transitions and column transitions: In Tetris, players aim to minimize row and column transitions after the pieces have landed. In task scheduling, higher row and column transitions typically indicate greater imbalance in processor utilization and lower overall processor efficiency.
Holes: In Tetris, holes generally represent a series of cells that are difficult to utilize because rows above them must be cleared before they can be used. Therefore, players aim to avoid creating holes. In task scheduling, an increased number of holes often indicates lower processor utilization.
Wells: In Tetris, a well refers to a depression in the board that can potentially lead to the creation of holes in subsequent game steps. Players tend to minimize the formation of wells. Similarly, in task scheduling, it is desirable to minimize wells to prevent inefficiency in processor usage.  □
The task scheduling process differs from the actual Tetris game in several key aspects. Firstly, in the Tetris game, there are pieces of various shapes, while in task scheduling, there is only one type of piece consisting of a single column and multiple rows. The number of rows in this piece corresponds to the WCET of the respective subtask. Secondly, in the Tetris game, a piece can descend until it reaches an adjacent filled cell. In task scheduling, in addition to this constraint, the piece cannot extend beyond the completion time of its predecessors.
Despite these differences, it is evident that they do not cause a qualitative change in the concepts outlined in Theorem 2. The impact of each concept may vary in specific applications, but the overall effect remains similar to that in the Tetris game. Consequently, the weight coefficients of these concepts need to be optimized for each particular scenario.

5. Scoring Method and Allocation Strategy

This section describes the execution steps of the algorithm and the details of each stage. Although the actual flow of the algorithm is shown in Figure 2, a bottom-up approach is used in the introduction progress, beginning with subalgorithms (Algorithms 1–3) and ending with a summary of the workflow (Algorithm 4).
Algorithm 1 Board evaluation
Input:  b o a r d , n u m R o w s R e m o v e d , d e s c e n d R o w , p i e c e L e n ;
Output:   s c o r e ;
  1:
s c o r e 1 = ( d e s c e n d R o w + p i e c e L e n / 2 ) × a 1
  2:
s c o r e 2 = n u m R o w s R e m o v e d × a 2
  3:
s c o r e 3 = r o w T r a n s i t i o n s ( b o a r d ) × a 3
  4:
s c o r e 4 = c o l u m n T r a n s i t i o n s ( b o a r d ) × a 4
  5:
s c o r e 5 = n u m b e r O f H o l e s ( b o a r d ) × a 5
  6:
s c o r e 6 = w e l l S u m ( b o a r d ) × a 6
  7:
s c o r e = s c o r e 1 + s c o r e 2 + s c o r e 3 + s c o r e 4 + s c o r e 5 + s c o r e 6
  8:
return  s c o r e
Algorithm 2 Try Subtask Descending
Input:  b o a r d , C i , j , t S t a r t m i n , c o l u m n , m;
Output:  b o a r d , n u m R o w s R e m o v e d ;
  1:
while  t S t a r t m i n + C i , j < ( r o w o f b o a r d )   do
  2:
    b o a r d = { b o a r d ; { 0 , 0 , , 0 } } // add all-zero rows to b o a r d
  3:
end while
  4:
for all  k = 1 , 2 , , C i , j  do
  5:
    b o a r d t S t a r t m i n + k , c o l u m n = 1
  6:
end if
  7:
n u m R o w s R e m o v e d = ( n u m b e r o f f u l l r o w s i n b o a r d )
  8:
remove full rows in b o a r d
  9:
return [ b o a r d , n u m R o w s R e m o v e d ]
Algorithm 3 Get Task Descend Result
Input:  b o a r d , E i , C i , m;
Output:  P i , b o a r d ;
  1:
t F i n i s h = { 0 , 0 , , 0 } // | V i | elements in total
  2:
P i = { 0 , 0 , , 0 } // | V i | elements in total
  3:
for all j s.t. Ci,jCj do
  4:
     s c o r e m a x =
  5:
     b o a r d n e w = b o a r d
  6:
    for all column = 1,2,⋯,m do
  7:
      b o a r d t e m p = b o a r d
  8:
      t S t a r t m i n = ( t h e r o w o f t h e l a s t 1 i n b o a r d c o l u m n )
  9:
     for all k s.t. e ( V i , k , V i , j ) E i do
 10:
         t S t a r t m i n = max ( t S t a r t m i n , t F i n i s h k )
 11:
     end for
 12:
      [ b o a r d t e m p , n u m R o w s R e m o v e d ] = T r y S u b t a s k D e s c e n d i n g ( b o a r d t e m p , C i , j ,   t S t a r t m i n , c o l u m n , m )
 13:
      s c o r e = B o a r d E v a l u a t i o n ( b o a r d t e m p , n u m R o w s R e m o v e d , t S t a r t m i n , C i , j , m )
 14:
     if  s c o r e > s c o r e m a x   then
 15:
           s c o r e m a x = s c o r e
 16:
           P i , j = c o l u m n
 17:
           b o a r d n e w = b o a r d t e m p
 18:
     end if
 19:
    end for
 20:
     b o a r d = b o a r d n e w
 21:
end for
 22:
return [ P i , b o a r d ]
Algorithm 4 TGSSA
Input:  p r i o S e q , E = { E 1 , E 2 , , E N } , C = { C 1 , C 2 , , C N } , m;
Output: P
  1:
P = { P 1 , P 2 , , P N }
  2:
b o a r d = { 0 , 0 , , 0 } // initialized to be a 1 × m zero matrix
  3:
for all  i p r i o S e q  do
  4:
    [ P i , b o a r d ] = G e t T a s k D e s c e n d R e s u l t ( b o a r d , E i , C i , m )
  5:
end for
  6:
return  P

5.1. Tetris Board Evaluation

Algorithm 1 is designed to evaluate a Tetris board, with higher scores indicating better alignment with optimization objectives.
Algorithm 1 accepts four inputs. The input b o a r d corresponds to the Tetris board state after the latest piece placement and row removal. The input n u m R o w s R e m o v e d specifies the number of rows cleared by the last piece placement. The d e s c e n d R o w specifies the row index where the most recent piece landed, representing the height of the lowest row occupied by the piece after it has settled. The p i e c e L e n indicates the height of the most recently placed piece. The output, s c o r e , represents the evaluation score for the b o a r d , with higher values indicating a better alignment with the desired criteria. It should be noted that the s c o r e is not necessarily a positive value.
Algorithm 1 evaluates a Tetris board based on six metrics. The score of each metric is weighted by a corresponding coefficient and is summed to obtain the final evaluation score.
  • s c o r e 1 (line 1): s c o r e 1 represents the height of the center of gravity of the most recently placed piece, multiplied by the coefficient a 1 . In the context of task scheduling, where pieces are C i , j × 1 strips, the center of gravity corresponds to the row index of the piece’s midpoint. Intuitively, we aim to place pieces as low as possible in Tetris. Therefore, a 1 is a negative value.
  • s c o r e 2 (line 2): s c o r e 2 denotes the number of rows cleared as a result of placing the most recent piece, weighted by the coefficient a 2 . Naturally, more clear rows is desirable. Thus, a 2 is positive.
  • s c o r e 3 (line 3): s c o r e 3 represents the number of row transitions in the remaining Tetris board after the recently placed piece has landed and all completed rows have been cleared, weighted by a 3 . A row transition occurs when a filled cell is adjacent to an empty cell or vice versa within a single row. The total number of row transitions is computed by summing this value of all rows. It is important to note that the calculation of row transitions in the task scheduling context differs from that in the standard Tetris game. In Tetris, the board is bounded by “walls” on both sides, which are typically modeled as columns of filled cells in the original El-Tetris algorithm. In the task scheduling scenario, however, each column represents a distinct processor (or core) and all processors access the same shared memory. Therefore, the leftmost and rightmost columns are treated as adjacent, forming a “cylindrical” structure. Since fewer row transitions are preferable, a 3 is a negative value.
  • s c o r e 4 (line 4): s c o r e 4 measures the number of column transitions in the remaining Tetris board after the most recent piece placement and clearing completed rows, multiplied by a 4 . A column transition occurs when a filled cell is adjacent to an empty cell or vice versa within a single column. Summing these transitions across all columns yields the total column transitions for the board. Similarly to row transitions, fewer column transitions are desirable, making a 4 a negative value.
  • s c o r e 5 (line 5): s c o r e 5 represents the number of “holes” in the remaining Tetris board after the recent placement and removal of the completed rows. A “hole” in a column is defined as an empty cell that has at least one filled cell above it in the same column. The total number of holes is obtained by summing the holes in each column. Fewer holes are preferred, so a 5 is a negative value.
  • s c o r e 6 (line 6): s c o r e 6 is the sum of “wells” in the remaining Tetris board after the most recent piece placement and row clear, weighted by a 6 . A “well” is defined as an empty cell that lies over a column’s filled cells and is flanked by filled cells on both sides. The total number of wells is calculated as the number of such empty cells in the board. Similar to holes, fewer wells are desirable; therefore, a 6 is negative.
Among the six coefficients ( a 1 to a 6 ), five are negative. As a result, the summation of s c o r e 1 to s c o r e 6 , s c o r e , is likely to be a negative value. However, this does not pose an issue. In subsequent computations, the selection of the maximum score among the various s c o r e remains unaffected.

5.2. Try Subtask Descending

Algorithm 2 primarily treats a given subtask as a Tetris piece and places it in a specified column within the Tetris board. Subsequently, it removes all completed rows.
Algorithm 2 accepts five inputs. The input b o a r d represents the Tetris board before placing the subtask V i , j . C i , j denotes the WCET of the subtask V i , j . The t S t a r t m i n indicates the earliest possible start time for the execution of V i , j . The c o l u m n specifies the target column where V i , j is to be placed, corresponding to the processor to which V i , j is assigned. The m is the total number of processors, equivalent to the total number of columns in the Tetris board, b o a r d . The output n u m R o w s R e m o v e d of Algorithm 2 represents the number of rows cleared as a result of placing V i , j . The output b o a r d reflects the state of the Tetris board after placing V i , j and removing all cleared rows.
Placing the subtask V i , j involves marking the C i , j empty cells in the range from t S t a r t m i n + 1 to t S t a r t m i n + C i , j as filled cells. To achieve this, the following steps are performed:
  • Board expansion (line 1 to 3): If the number of rows in b o a r d is less than t S t a r t m i n + C i , j , extend the board by adding rows until its number of rows equals t S t a r t m i n + C i , j .
  • Filling cells (line 4 to 6): In the extended b o a r d , mark the C i , j empty cells in the n specified c o l u m n , spanning form row t S t a r t m i n + 1 to t S t a r t m i n + C i , j , as filled cells.
  • Row count recording (line 7): The number of completed rows in b o a r d at this stage is recorded and denoted as n u m R o w s R e m o v e d .
  • Row removal (line 8): Completed rows are removed from b o a r d .

5.3. Get Task Descend Result

For a given input DAG task, Algorithm 3 treats each of its subtasks as Tetris pieces and attempts to place them in each column of the Tetris board. For each subtask, Algorithm 3 identifies the column that yields the highest Tetris board score after placement, records the corresponding column, i.e., the processor to which the subtask is assigned, and updates the Tetris board accordingly.
Algorithm 3 accepts four inputs. The input b o a r d represents the state of the Tetris board prior to the partitioning of the DAG task τ i . E i denotes the set of edges associated with the DAG task τ i . C i corresponds to the WCET of each subtask within τ i . m specifies the total number of processors, which is equivalent to the number of columns in the b o a r d . The output P i indicates the assignment of each subtask of τ i to a specific processor.
The completion times t F i n i s h for the | V i | subtasks and the processor assignments P i for these | V i | subtasks are initialized. Subsequently, for each subtask V i , j in the task τ i , the following steps are executed sequentially.
  • Initialization (lines 4 to 5): The maximum score s c o r e m a x for the subtask is set to be , and the current state of b o a r d is stored in b o a r d n e w .
  • Processor assignment (lines 6 to 13): Through using Algorithm 2 T r y S u b t a s k D e s c e n d i n g , subtask V i , j is attempted to be assigned to each processor. For each attempt, a new temporary board b o a r d t e m p is generated. The state of the resulting board b o a r d t e m p can be evaluated through Algorithm 1 T e t r i s B o a r d E v a l u a t i o n to obtain a score for the assignment.
  • Score update (lines 14 to 18): If the score for the current processor assignment exceeds the current maximum score s c o r e m a x , s c o r e m a x should be updated to the new score. The corresponding processor assignment should be recorded in P i , j , and b o a r d n e w should be updated to the state of b o a r d t e m p .
  • Board update (line 20): After attempting to assign V i , j to all processors and determining the processor P i , j corresponding to the highest score, b o a r d should be updated to b o a r d n e w . This reflects the updated state of board after assigning V i , j based on the highest scoring placement.
Through this algorithm, each subtask V i , j is assigned to the processor that maximizes the overall board evaluation score, ensuring an optimized task allocation.

5.4. General Progress

After the layered construction of the aforementioned functions, the top-level algorithm becomes relatively straightforward, as shown in Algorithm 4.
The primary function of Algorithm 4 is to sequentially invoke Algorithm 3 on n DAG tasks based on their priority order.
Algorithm 4 accepts three inputs. p r i o S e q represents the sequence of n DAG tasks sorted in descending order of priority, where p r i o S e q i denotes the index of the DAG task with the i-th highest priority. E represents the set of edges for all DAG tasks in the group. C is the WCET matrix for all DAG tasks. The output of Algorithm 4 is a single variable, P, which contains the processor assignments for each subtask of every DAG task. Specifically, P consists of n elements, where each element P i is an array of v elements. The j-th element of P i indicates the processor to which the j-th subtask of the i-th DAG task has been assigned.
At the beginning of Algorithm 4, two variables, P and b o a r d , are initialized. P is the final output of the function, as defined earlier. Initially, P consists of n elements, where each element is an array of | V i | zeros. The specific initialization value (e.g., zeros) is not critical. It can be any value of your choice. The purpose of this step is simply to preallocate memory for the P matrix during program execution. b o a r d represents the Tetris board used to simulate the Tetris game. The board has m columns (corresponding to the number of processors) and is initially a single row. Each element in this row is initialized to zero, indicating that no blocks have yet been placed on the board.
The algorithm iterates n times; there, in each iteration, a DAG task index i is sequentially extracted from p r i o S e q and passed to Algorithm 3 G e t T a s k D e s c e n d R e s u l t . Each iteration of Algorithm 3 serves two primary purposes. They are partitioning the subtask of | V i | with the resulting assignments stored in P i , and updating the current state of the Tetris board. Finally, after completing N iterations, the P matrix is returned as the output.
In Algorithm 1, the computational complexity of the calculation of s c o r e 1 is O ( 1 ) . The calculation of each parameter from s c o r e 2 to s c o r e 6 traverses the Tetris board at most once. The Tetris board has m columns. In the worst case, it has | V i |   × max j ( C i , j ) rows, where max j ( C i , j ) is a constant. The computational complexity of Algorithm 1 is O ( m | V i | ) . Lines 1 to 6 of Algorithm 2 traverse the Tetris board less than once, and lines 7 to 8 traverse it at most once. The computational complexity of Algorithm 2 is O ( m | V i | ) . The “for” loops in line 3 and line 6 of Algorithm 3 need to execute Algorithm 1 and Algorithm 2 m | V i | times, respectively, making the computational complexity of Algorithm 3 O ( m 2 | V i | 2 ) . Algorithm 4 executes Algorithm 3 N times. Therefore, the computational complexity of it is O ( N m 2 | V i | 2 ) , i.e., the computational complexity of TGSSA is O ( N m 2 | V i | 2 ) .

5.5. Discussion

As mentioned before, this model needs to use different weight coefficients in different scenarios, but we have not yet found a low-computational-complexity and efficient way to characterize the relationship between the coefficients and specific scenarios. Therefore, these weight coefficients need to be calculated before using them in specific scenarios. In future work, it is possible to explore a low-computational-complexity and efficient way to characterize the relationship between the coefficients and specific scenarios, and incorporate it into the current model to improve the model.

6. Experiments

In this section, the performance of the TGSSA is evaluated by comparing it with two other scheduling strategies. They are random allocation and the Equilibrium Remaining Utilization (ERU) algorithm, which are used in the latest state-of-the-art scheduling strategy [12].
The generation of tested DAG sets and actual platform settings are introduced before presenting the performance evaluation. DAG set generation includes the chosen parameters and the generation progress of DAG sets. The evaluation of performance is divided into two parts. One part is the schedulability test through WCRT analysis, and the other part is executing allocated tasks on an actual real-time platform.

6.1. Tested DAG Sets Generation

To make the experimental results more general, we strive to maintain consistency with previous studies regarding DAG parameters and DAG generation tools. We control the generation of DAG task sets using the following five DAG parameters [13]:
  • U: The utilization of the DAG task set Γ .
  • N: The number of DAG tasks in Γ .
  • | V i | : The number of subtasks in τ i
  • p: The probability of creating edges.
  • m: The number of processors.
The generation of each DAG task τ i involves two primary steps, topological structure generation and WCET assignment. In the topological structure generation step, the topological structure of the DAG task is created using the G e n e r a t e G ( | V i | , p ) method described in [24]. In the WCET assignment step, a set of | V i | random integers is drawn from the interval [ 1 , 100 ] as the WCET values C i , j for each subtask in the DAG task. The overall WCET of the DAG task, C s u m i , can be calculated as C s u m i = j = 1 | V i | C i , j . This process is repeated N times to generate the topological structures and WCET values for N distinct DAG tasks.
To determine the utilization U i for each task τ i in a given task set Γ with a total utilization U, we employ the R a n d f i x e d s u m algorithm. This algorithm generates a matrix where the sum of the elements is fixed, while allowing the user to specify the maximum and minimum values for each element. By using this method, we ensure that the utilization U i of any individual task τ i does not exceed 1. Next, the period T i of each DAG task can be calculated using the equation T i = C s u m i / U i . Fixed priorities π i are assigned to each DAG task in Γ based on their relative periods, following the RMS policy. Finally, the generated DAG task set Γ is input into the respective algorithms to assign subtasks to processors. At this point, the task set Γ is ready for further WCRT analysis or testing on a real platform.
To demonstrate the consistency between simulation results and actual platform performance, both the WCRT analysis and actual platform tests presented in this section are conducted using a similar dataset [25,26]. The generated test dataset adopts the following default parameters: U = 2.0 , N = 20 , | V i | = 20 , p = 10 % , and m = 16 . Under these default settings, we vary one parameter at a time while keeping the others fixed to generate additional datasets.
  • U: 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2.
  • N: 20, 25, 30, 35, 40, 45, 50.
  • | V i | : 10, 14, 18, 22, 26, 30.
  • p: 5%, 10%, 15%, 20%, 25%, 30%.
  • m: 2, 4, 8, 16.
Hence, the total utilization of a task set Γ will be set from 1 to 2.2 with each step increment of 0.2. The number of DAG tasks in a task set will be set from 20 to 50 with each step increment of 5. The number of subtasks in a DAG task will be set from 10 to 30 with each step increment of 4. The number of processors will be set to be 2, 4, 8, and 16 [26]. The topological matrix of the DAG task is generated by a DAG generation tool with G e n e r a t e G ( | V i | , p ) , and p will be set from 5% to 30% with each step increment of 5% [24].
We use one DAG parameter as a variable each time and fix the remaining four DAG parameters and test with five DAG parameters sequentially as variables to verify our algorithm performance in various scenarios. We separately generate one thousand DAG task sets for each varied parameter. We define the percentage of schedulable task sets under a given m as acceptance ratio. For instance, the acceptance ratio is 13 percent under m = 16 , which represents that 130 task sets are schedulable. For average WCRT and acceptance ratio, we only count data from schedulable task sets.
It is important to note that when using TGSSA for task partitioning, the coefficients of Algorithm 1 are adjusted to account for their varying sensitivity to different parameter changes. Specifically, when U, N, or m are the variables, the coefficients are set as follows: a 1 = 4.5 , a 2 = 3.4 , a 3 = 3.2 , a 4 = 9.3 , a 5 = 9 , a 6 = 5.5 . When | V i | and p are variables, the coefficients are adjusted to a 1 = 4.5 , a 2 = 3.4 , a 3 = 4 , a 4 = 8 , a 5 = 11 , a 6 = 6 .

6.2. Actual Platform Settings

The actual testing platform is a laptop equipped with an AMD Ryzen 7 4800U processor designed by Advanced Micro Devices company in California, United States featuring 8 cores and 16 threads, along with 16 GB of DDR4 3200 MHz memory. The system runs Ubuntu 22.04 with an RT-Patched Linux kernel [27]. The specific patch version applied is patch-6.5.2-rt8.
It is worth mentioning that before compiling the patched kernel, the following configurations should be made in make menuconfig command:
  • In General setup -> Preemption Model, select Fully Preemptible Kernel (RT).
  • Uncheck Device Drivers -> Staging drivers.
  • In General setup -> Timer subsystem, enable High Resolution Timer Support.
  • In processor type and features -> Timer frequency, set the frequency to 1000 Hz.
After saving and exiting the configuration interface, modify the generated .config file. Specifically, leaving the double quotes themselves intact, remove the content inside the double quotes in the line CONFIG_SYSTEM_TRUSTED_KEYS=‘‘’’.
After compiling the RT-Patched kernel and setting it as the default kernel for the system, restart the computer. To verify the kernel version, use the uname -a command. The output should display Linux … 6.5.2-rt8 #2 SMP PREEMPT_RT … x86_64 GNU/Linux.
To run tasks and subtasks on actual platform, we used the pthread.h library in C language. Each subtask is created using function pthread_create(), the relationship between subtasks is limited using function pthread_cond_wait(), and the execution time of each subtask is limited using sys/time.h. The threads generated in this way are consistent with the actual threads.

6.3. Experiment Results

As mentioned at the beginning of this section, the performance evaluation in this article consists of a schedulability test through WCRT analysis and executing allocated tasks on an actual real-time platform.
Due to the limitations of existing WCRT analysis algorithms, the schedulability rates derived from their analysis are typically much lower than those observed in real platform tests. In the real platform tests conducted in this article, under all selected test conditions, the schedulability rates for all algorithms were consistently above 90%, with no significant performance differences between the algorithms. As such, further details are omitted. The only exception occurs when the number of processors is set to 2, where the schedulability rate for the random allocation algorithm falls below 30%.
Figure 3 shows the average WCRT drawn from WCRT analysis. A lower average WCRT indicates better performance. In almost all cases, TGSSA achieves a lower average WCRT compared to both random allocation and ERU. Figure 4 shows the schedulability ratio drawn from WCRT analysis. It is clear that TGSSA achieves a higher schedulability ratio compared to both random allocation and ERU in almost all cases. Figure 5 shows the average WCRT on the actual real-time platform. The trends presented in Figure 3 and Figure 5 are generally consistent. Next, we will focus on analyzing the data in Figure 5.
Figure 5a illustrates the average WCRT of the DAG task set as the utilization varies. From the figure, it is evident that for the same utilization values, the average WCRT obtained using TGSSA is lower compared to both the random and ERU algorithms. Specifically, for utilization values U = 1.0 , 1.2 , 1.4 , 1.6 , 1.8 , 2.0 , 2.2 , the WCRT improves over random by 34.54%, 36.05%, 32.41%, 31.20%, 31.99%, 31.29%, and 25.24%, respectively, and improves over ERU by 11.48%, 15.33%, 5.76%, 5.54%, 6.32%, 3.78%, and 3.03%, respectively.
Figure 5b presents the average WCRT of the DAG task set as the number of tasks varies. From the figure, it is clear that for the same number of tasks, the average WCRT obtained using TGSSA is lower than that of both the random and ERU algorithms. Specifically, for N = 20 , 25 , 30 , 35 , 40 , 45 , 50 , the WCRT is improved over random by 29.31%, 40.04%, 33.60%, 37.07%, 41.43%, 41.50%, and 44.86%, respectively, and improved over ERU by 3.74%, 12.21%, 5.94%, 14.37%, 10.67%, 11.97%, and 16.68%, respectively.
Figure 5c shows the average WCRT of the DAG task set as the number of subtasks varies. From the figure, it can be observed that for the same number of subtasks, the average WCRT achieved by TGSSA is lower than that of the random algorithm, and in cases where | V i | is higher, it is also lower than that of ERU. Specifically, for | V i | = 10 , 14 , 18 , 22 , 26 , 30 , the WCRT improves over random by 25.59%, 26.70%, 32.18%, 29.61%, 24.77%, and 26.30%, respectively, and improves over ERU by −3.01%, −1.40%, 4.19%, 5.02%, 3.20%, and 5.58%, respectively.
It can be observed that when | V i | is relatively low, the performance of TGSSA is suboptimal. This is due to the use of a fixed set of weight values in the scoring algorithm; meanwhile, the ratio of subtask quantity to processor quantity changes too much. Fixed weights may not be suitable for all scenarios. Taking the scenario where | V i | is variable as an example, when | V i | is low, the impact of certain factors is weaker, suggesting that the weight coefficients should be appropriately adjusted to better accommodate such conditions.
Figure 5d illustrates the average WCRT of the DAG task set as the probability of creating edges varies. From the figure, it is evident that for the same probability of creating edges, TGSSA achieves a lower average WCRT than both the random and ERU algorithms in most cases. Specifically, for p = 5 , 10 , 15 , 20 , 25 , 30 , the WCRT improves over random by 42.27%, 28.43%, 18.42%, 14.75%, 16.95%, and 12.28%, respectively, and improves over ERU by 8.46%, 6.54%, 2.86%, −0.76%, 4.16%, and 3.26%, respectively.
Figure 5e presents the average WCRT of the DAG task set as the number of processors varies. Due to the significant variation in average WCRT with respect to the number of processors (with random values of 163604.4, 38565.0, 21032.7, and 15370.7), the plotting method was adjusted. Unlike the previous figures, which used absolute values, this figure uses relative values, representing the ratio of WCRT for each algorithm to that of the random algorithm. From the figure, it is apparent that for the same number of processors, TGSSA achieves a lower average WCRT than random, and in higher values of m, it also outperforms ERU. Specifically, for m = 2 , 4 , 6 , 8 , the WCRT improves over random by 33.82%, 35.41%, 31.06%, and 32.30%, respectively, and over ERU by −3.38%, 4.97%, 2.40%, and 7.25%, respectively. Overall, TGSSA has achieved excellent scheduling performance.

7. Conclusions and Future Work

This article draws inspiration from the Tetris algorithm (El Tetris) and proposes a new scheduling algorithm with multiple-heuristic parameters. We simulated the real-time DAG task scheduling process as a Tetris game process and completed the scheduling on multiple processors. Compared with previous algorithms, TGSSA achieved better average WCRT analysis results, processor utilization, real-time system stability, and average WCRT observed in an actual RTOS.
In our future work, we plan to address the following issues. The first question is determining how to optimize coefficients for multiple-heuristic parameters through artificial intelligence methods to achieve better scheduling results [28]. The second issue is establishing an effective task clustering strategy to reduce algorithmic complexity, especially for large DAG with intricate parallel dependencies. Finally, we aim to adapt our algorithm for efficient real-time performance as an online scheduling solution combining with compute-bound parallel jobs and I/O-bound parallel jobs in multiprocessor systems [29].

Author Contributions

Methodology, Y.C. and S.L.; validation, Y.C. and S.L.; investigation, X.L. and Z.H.; writing—original draft preparation, Y.C. and S.L.; writing—review and editing, X.L. and Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are unavailable due to privacy or ethical restrictions.

Acknowledgments

The authors would like to thank the Editors and Reviewers for their contributions to our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RTOSReal-time operating system
IoTInternet of Things
WCRTWorst-case response time
RMSRate monotonic scheduling
EDFEarliest deadline first
DMSDeadline monotonic scheduling
DAGDirected acyclic graph
STPASchedulability testing priority assignment
WCETWorst-case execution time
TGSSATetris game scoring scheduling algorithm
RTAResponse time algorithm
ERUEquilibrium remaining utilization

References

  1. Hiroyuki, C. RT-Seed: Real-Time Middleware for Semi-Fixed-Priority Scheduling. In Proceedings of the 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC), York, UK, 17–20 May 2016; pp. 124–133. [Google Scholar]
  2. Ranvijay; Yadav, R.S.; Smriti, A. Efficient energy constrained scheduling approach for dynamic real time system. In Proceedings of the 2010 First International Conference On Parallel, Distributed and Grid Computing (PDGC 2010), Solan, India, 28–30 October 2010; pp. 284–289. [Google Scholar]
  3. Biao, H.; Cao, Z.C.; Zhou, M.C. Scheduling Real-Time Parallel Applications in Cloud to Minimize Energy Consumption. IEEE Trans. Cloud Comput. 2022, 10, 662–674. [Google Scholar]
  4. Hu, M.L.; Bharadwaj, V. Dynamic Scheduling of Hybrid Real-Time Tasks on Clusters. IEEE Trans. Comput. 2014, 63, 2988–2997. [Google Scholar] [CrossRef]
  5. Dong, W.; Chen, C.; Liu, X.; Zheng, K.G.; Chu, R.; Bu, J.J. FIT: A Flexible, Lightweight, and Real-Time Scheduling System for Wireless Sensor Platforms. IEEE Trans. Parallel Distrib. Syst. 2010, 21, 126–138. [Google Scholar] [CrossRef]
  6. Zhu, Q.; Zeng, H.B.; Zheng, W.; Marco Di, N.; Alberto, S.V. Optimization of task allocation and priority assignment in hard real-time distributed systems. ACM Trans. Embed. Comput. Syst. (TECS) 2012, 11, 85–96. [Google Scholar] [CrossRef]
  7. Alessandro, B.; Giorgio, B. Engine control: Task modeling and analysis. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 9–13 March 2015; pp. 525–530. [Google Scholar]
  8. Nong, G.; Hamdi, M. On the provision of quality-of-service guarantees for input queued switches. IEEE Commun. Mag. 2000, 38, 62–69. [Google Scholar]
  9. Biondi, A.; Di Natale, M.; Buttazzo, G. Response-Time Analysis of Engine Control Applications Under Fixed-Priority Scheduling. IEEE Trans. Comput. 2018, 67, 687–703. [Google Scholar] [CrossRef]
  10. Alessandro, B.; Alessandra, M.; Mauro, M.; Marco, D.N.; Giorgio, B. Exact Interference of Adaptive Variable-Rate Tasks under Fixed-Priority Scheduling. In Proceedings of the 2014 26th Euromicro Conference on Real-Time Systems, Madrid, Spain, 8–11 July 2014; pp. 165–174. [Google Scholar]
  11. Chen, Y.M.; Liu, S.L.; Chen, Y.J.; Ling, X. A scheduling algorithm for heterogeneous computing systems by edge cover queue. Knowl.-Based Syst. 2023, 265, 110369. [Google Scholar] [CrossRef]
  12. Wu, Y.L.; Zhang, W.Z.; Guan, N.; Ma, Y.H. TDTA: Topology-Based Real-Time DAG Task Allocation on Identical Multiprocessor Platforms. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2895–2909. [Google Scholar] [CrossRef]
  13. Wu, Y.L.; Zhang, W.Z.; Guan, N.; Tang, Y. Improving Interference Analysis for Real-Time DAG Tasks Under Partitioned Scheduling. IEEE Trans. Comput. 2022, 71, 1495–1506. [Google Scholar] [CrossRef]
  14. Abusayeed, S.; Kunal, A.; Lu, C.Y.; Christopher, G. Multicore Real-Time Scheduling for Generalized Parallel Task Models. In Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium, Washington, DC, USA, 29 November–2 December 2011; Volume 10, pp. 217–226. [Google Scholar]
  15. Marko, B.; Sanjoy, B. Limited Preemption EDF Scheduling of Sporadic Task Systems. IEEE Trans. Ind. Inform. 2010, 6, 579–591. [Google Scholar]
  16. Shinpei, K.; Yutaka, I. Gang EDF Scheduling of Parallel Task Systems. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium, Washington, DC, USA, 1–4 December 2009; pp. 459–468. [Google Scholar]
  17. Karthik, L.; Shinpei, K.; Ragunathan, R. Scheduling Parallel Real-Time Tasks on multicore Processors. In Proceedings of the 2010 31st IEEE Real-Time Systems Symposium, San Diego, CA, USA, 30 November–3 December 2010; pp. 259–268. [Google Scholar]
  18. Daniel, C.; Alessandro, B.; Geoffrey, N.; Giorgio, B. Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions. In Proceedings of the 2018 IEEE Real-Time Systems Symposium (RTSS), Nashville, TN, USA, 11–14 December 2018; pp. 421–433. [Google Scholar]
  19. Özkaya, M.Y.; Benoit, A.; Uçar, B.; Herrmann, J.; Çatalyürek, Ü.V. A Scalable Clustering-Based Task Scheduler for Homogeneous Processors Using DAG Partitioning. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 20–24 May 2019; pp. 155–165. [Google Scholar]
  20. El-Tetris: An Improvement on Pierre Dellacheries Algorithm. Available online: https://imake.ninja/el-tetris-an-improvement-on-pierre-dellacheries-algorithm/ (accessed on 26 December 2024).
  21. Abusayeed, S.; David, F.; Li, J.; Kunal, A.; Lu, C.Y.; Christopher, D.G. Parallel Real-Time Scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 3242–3252. [Google Scholar]
  22. Shamit, B.; Zhao, Y.C.; Zeng, H.B.; Yang, K.H. Optimal Implementation of Simulink Models on Multicore Architectures with Partitioned Fixed Priority Scheduling. In Proceedings of the 2018 IEEE Real-Time Systems Symposium (RTSS), Nashville, TN, USA, 11–14 December 2018; Volume 10, pp. 242–253. [Google Scholar]
  23. Jose, F.; Geoffrey, N.; Vincent, N.; Luís Miguel, P. Response time analysis of sporadic DAG tasks under partitioned scheduling. In Proceedings of the 2016 11th IEEE Symposium on Industrial Embedded Systems (SIES), Krakow, Poland, 23–25 May 2016; pp. 1–10. [Google Scholar]
  24. Erdős, P.; Alfréd, R. On Random Graphs I. Publ. Math. 1959, 4, 3286–3291. [Google Scholar] [CrossRef]
  25. Davis, R.I.; Burns, A. Response Time Upper Bounds for Fixed Priority Real-Time Systems. In Proceedings of the 2008 Real-Time Systems Symposium, Washington, DC, USA, 30 November–3 December 2008; pp. 407–418. [Google Scholar]
  26. Guan, N.; Martin, S.; Yi, W.; Yu, G. New Response Time Bounds for Fixed Priority Multiprocessor Scheduling. In Proceedings of the 2009 30th IEEE Real-Time Systems Symposium, Washington, DC, USA, 1–4 December 2009; pp. 387–397. [Google Scholar]
  27. RT-Linux System. Available online: https://en.wikipedia.org/wiki/RTLinux (accessed on 26 December 2024).
  28. Anderson, G.G. Application of Standard Optimization Methods to Operating System Scheduler Tuning. In Operating System Scheduling Optimization; University of Johannesburg: Johannesburg, South Africa, 2013; pp. 56–69. [Google Scholar]
  29. Wiseman, Y.; Feitelson, D.G. Paired gang scheduling. IEEE Trans. Parallel Distrib. Syst. 2003, 14, 581–592. [Google Scholar] [CrossRef]
Figure 1. An example of a DAG task τ i with eight subtasks and C s u m i = 95 , T i = D i = 100 . A dummy exit subtask will be added to the graph as a child subtask of V i , 3 , V i , 7 , and V i , 8 .
Figure 1. An example of a DAG task τ i with eight subtasks and C s u m i = 95 , T i = D i = 100 . A dummy exit subtask will be added to the graph as a child subtask of V i , 3 , V i , 7 , and V i , 8 .
Electronics 14 00098 g001
Figure 2. The relationship between algorithms.
Figure 2. The relationship between algorithms.
Electronics 14 00098 g002
Figure 3. Average WCRT from WCRT analysis algorithm (RTA). (a) Average WCRT with changing utilization. (b) Average WCRT with changing number of tasks. (c) Average WCRT with changing number of subtasks. (d) Average WCRT with varied probability of creating edges. (e) Average WCRT with changing number of processors.
Figure 3. Average WCRT from WCRT analysis algorithm (RTA). (a) Average WCRT with changing utilization. (b) Average WCRT with changing number of tasks. (c) Average WCRT with changing number of subtasks. (d) Average WCRT with varied probability of creating edges. (e) Average WCRT with changing number of processors.
Electronics 14 00098 g003
Figure 4. Schedulability ratio from WCRT analysis algorithm (RTA). (a) Schedulable ratio with changing utilization. (b) Schedulable ratio with changing number of tasks. (c) Schedulable ratio with changing number of subtasks. (d) Schedulable ratio with varied probability of creating edges. (e) Schedulable ratio with changing number of processors.
Figure 4. Schedulability ratio from WCRT analysis algorithm (RTA). (a) Schedulable ratio with changing utilization. (b) Schedulable ratio with changing number of tasks. (c) Schedulable ratio with changing number of subtasks. (d) Schedulable ratio with varied probability of creating edges. (e) Schedulable ratio with changing number of processors.
Electronics 14 00098 g004
Figure 5. Average WCRT on actual real-time platform. (a) Average WCRT with changing utilization. (b) Average WCRT with changing number of tasks. (c) Average WCRT with changing number of subtasks. (d) Average WCRT with varied probability of creating edges. (e) Average WCRT with changing number of processors.
Figure 5. Average WCRT on actual real-time platform. (a) Average WCRT with changing utilization. (b) Average WCRT with changing number of tasks. (c) Average WCRT with changing number of subtasks. (d) Average WCRT with varied probability of creating edges. (e) Average WCRT with changing number of processors.
Electronics 14 00098 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Liu, S.; He, Z.; Ling, X. A Tetris-Based Task Allocation Strategy for Real-Time Operating Systems. Electronics 2025, 14, 98. https://doi.org/10.3390/electronics14010098

AMA Style

Chen Y, Liu S, He Z, Ling X. A Tetris-Based Task Allocation Strategy for Real-Time Operating Systems. Electronics. 2025; 14(1):98. https://doi.org/10.3390/electronics14010098

Chicago/Turabian Style

Chen, Yumeng, Songlin Liu, Zongmiao He, and Xiang Ling. 2025. "A Tetris-Based Task Allocation Strategy for Real-Time Operating Systems" Electronics 14, no. 1: 98. https://doi.org/10.3390/electronics14010098

APA Style

Chen, Y., Liu, S., He, Z., & Ling, X. (2025). A Tetris-Based Task Allocation Strategy for Real-Time Operating Systems. Electronics, 14(1), 98. https://doi.org/10.3390/electronics14010098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop