ATMP-CA: Optimising Mixed-Criticality Systems Considering Criticality Arithmetic

Many safety-critical systems use criticality arithmetic, an informal practice of implementing a higher-criticality function by combining several lower-criticality redundant components or tasks. This lowers the cost of development, but existing mixed-criticality schedulers may act incorrectly as they lack the knowledge that the lower-criticality tasks are operating together to implement a single higher-criticality function. In this paper, we propose a solution to this problem by presenting a mixed-criticality mid-term scheduler that considers where criticality arithmetic is used in the system. As this scheduler, which we term ATMP-CA, is a mid-term scheduler, it changes the configuration of the system when needed based on the recent history of deadline misses. We present the results from a series of experiments that show that ATMP-CA’s operation provides a smoother degradation of service compared with reference schedulers that do not consider the use of criticality arithmetic.


Introduction
Mixed-criticality systems are a special kind of safety-critical systems, where not all 12 provided services have the same criticality. For example, in an aeroplane, the correct 13 operation of the engines is of higher criticality than the onboard intercom system. With 14 the seminal work by Vestal in 2007 [2], scheduling of mixed-criticality systems has 15 become a quite active research field [3]. 16 The development of services with higher criticality requires higher effort than ser-17 vices with lower criticality [4]. Criticality arithmetic -also referred to as SIL arithmetic 18 -is a way of reducing that effort [5]. Criticality arithmetic is the process of realising a 19 single function of importance to safety by combining multiple redundant independent 20 components each of which implement this function. Should any one of these compo-21 nents fail, the others -being independent -will continue to provide this function. One 22 consequence of this is that the correct and continued functioning of any single one of 23 these components need not be assured to the same rigour as would be necessary if it 24 alone were to be relied upon to provide this function. 25 Criticality arithmetic has a number of benefits as identified in Section 2.2.1. These 26 largely refer to the reduced development and assurance cost, which is a result of each 27 individual component being of lesser importance to safety than it might otherwise be. 28 However, there are also a number of drawbacks, as discussed in Section 2.2.2. Using 29 criticality arithmetic can make it more difficult to adequately determine the impact of 30 individual component failures. Criticality arithmetic is the process of realising a single 31 function of importance to safety by combining multiple redundant independent compo-32 nents each of which implement this function. Should any one of these components fail, So far, the use of criticality arithmetic is an informal and qualitative process with 39 no formal universally-accepted definition. Individual standards prescribe different 40 methods for, and constraints on, the use of criticality arithmetic within relevant domains 41 as described in Section 2. As such, we do not attempt in this paper to define a quantitative 42 method for estimating the increase in dependability afforded by the use of criticality 43 arithmetic. Rather, we examine the consequences of mixed-criticality task scheduling in 44 an environment where criticality arithmetic has been used. 45 Specifically, in this paper we show an example of modifying a scheduler to take 46 advantage of the knowledge that criticality arithmetic is used in the system. To start with, 47 we took the ATMP scheduler from Iacovelli et al. [6] and modified its core allocation 48 and ILP constraint generation, so that the resulting scheduler provides a better handling 49 of tasks that are replicated for criticality arithmetic. This ATMP scheduler takes as 50 input utility functions for each task, so that the overall system utility can be gracefully 51 distributed among tasks in case of resource shortages, e.g., caused by faults [7,8] 52 The remainder of the paper is structured as follows: Section 2 describes criticality 53 arithmetic in further detail with the link to safety standards. Section 3 describes the 54 system model that is used for the criticality-arithmetic-aware ATMP-CA, described in 55 Section 4. An experimental evaluation is given in Section 5. Finally, Section 6 concludes 56 this paper.

58
While there does not exist a formal definition of criticality arithmetic, in this section 59 we describe some practical aspects of its applicability. Assurance Levels (DALs) in ARP 4654. In this section we provide a brief introduction 67 to safety integrity, using the SIL terminology of IEC 61508 [4] as an exemplar. Further 68 details can be found in [5].

69
IEC 61508 [4], defines the safety integrity of a component as the probability of 70 that component satisfactorily performing its specified safety function. It defines four 71 Safety Integrity Levels (SILs), with a higher safety integrity level being ascribed to those 72 components which are more important to safety. In this way, the SIL of a component 73 is an indication of the extent to which that component is important with regards to the 74 safety of the overall system. As an indication, Table Table 1 provides the association 75 between the target failure rate of a component (the probability of failure on demand 76 (PFD) or, for continuous operations, the probability of failure per hour (PFH)) and the 77 consequent associated SIL of that component, as described in [4]. 10 −4 to 10 −5 10 −8 to 10 −9 3 10 −3 to 10 −4 10 −7 to 10 −8 2 10 −2 to 10 −3 10 −6 to 10 −7 1 10 −1 to 10 −2 10 −5 to 10 −6 Criticality arithmetic (or SIL arithmetic as termed in [5]) refers to the practice of using 93 multiple redundant independent implementations of a lower integrity level component 94 providing a function F, in order to realise F at a higher integrity level than that of any 95 of the individual components [11]. Criticality arithmetic therefore relies on the use of 96 functional redundancy, or the duplication of certain critical system components which all 97 provide a defined function. This means that if any one of these components fails, the 98 remaining components will still be able to provide that function.

99
Different domains make use of this concept with their own specific integrity termi-100 nology. In IEC 61508 [4], SIL arithmetic is used when discussing ways in which hardware note that an effective system of redundancy management [12] is required in order to  sufficient independence between these relevant components may still be a non-trivial 131 task [14]. Criticality arithmetic also allows for the commercial pressures of developing 132 and procuring systems. In some industries logistical factors mean that components have 133 to be procured before their integrity levels can be assured. Should these components 134 be later shown to have achieved a lower integrity level than that expected, criticality 135 arithmetic may be used to address the gap. Similarly criticality arithmetic can in some 136 cases permit the use of legacy components (i.e. where the development effort is already 137 completed) at lower integrity levels [15].

138
Criticality arithmetic also permits the use of less-complex components, where these 139 have achieved a lower SIL than that needed by the overall system. Components of

169
In the following we describe our system model and assumptions. We assume a mixed-criticality system, which consists of multiple services that could have different levels of criticality. A service can be implemented by one task or multiple tasks using criticality arithmetic. The system provides a number of services:  T is the set of tasks τ ∈ T that implement the service s. If only one task implements 175 the service (|T| = 1), then no criticality arithmetic is used, and the task in this case 176 has the same criticality as the service. If multiple services implement the service 177 (|T| > 1), then criticality arithmetic is used: the criticality of each task τ ∈ T has a 178 criticality less than the service's criticality, but their redundant execution 179 Each task τ of a task set T is defined as follows:  s is the service that is implemented by task τ. 185 d is the relative deadline of task τ. We assume implicit deadlines, e.g. d = p prim .

186
(Note that this assumption is only chosen for the concrete scheduling test in our utility U, which is calculated as U = u · l.

199
The aim of the method described in this paper is to find for each task τ i a period p i 200 so that the overall system utility is maximised.

201
The individual instances of a task at runtime are called jobs. A job j is described by  The fundamental concept of our scheduler is the tolerance-based real-time computing 208 model (TRTCM) [1,8]. Instead of using a single performance limit like a deadline or 209 throughput limit, in TRTCM a tolerance range is added, which allows in case of resource 210 shortage a guided search for the best overall system utility.

211
In this paper we focus on optimising the throughput of a system based on TRTCM.

212
For any period ≤ p prim the relative utility is 1.0, i.e., the maximum. For any period higher 213 than p prim , the relative utiliy of a service degrades. This degradation is approximated by 214 the utility function of a service, which defines another period p tol up to which the service 215 is still considered acceptable but with lower utility u tol . For any period p prim ≤ p ≤ p tol 216 the relative utility is expressed as a linear function, as shown in Figure 1.

238
The Adaptive Tolerance-based Mixed-criticality Protocol (ATMP) [6], is an appli-239 cation of the TRTCM model [1,8] that maximises the system utility on each core by 240 adjusting the periods of tasks within their tolerance range. The basic implementation of 241 ATMP, categorises system tasks according to their adaptation capability. In other words, 242 the ability of a task to relax its interarrival rates and its usefulness to the overall system 243 decides if it will be allocated or not in case of computing resource shortage. In such 244 a case, ATMP sort tasks according to decreasing criticality. Then, a critilcality-utility 245 aware allocation for system tasks is performed on available cores. On each core, if the 246 partitioned tasks on that core is schedulable, then it is processed by the underlying  The task allocation to cores in ATMP-CA differs from the original one in ATMP 259 by avoiding that more than one of the replicated tasks from a service with criticality 260 arithmetic gets allocated to the same core. The reason for this is simply to ensure fault 261 tolerance for the replicated tasks, such that each core failure can disrupt at maximum 262 one of the replicated tasks. In addition, the new core allocation also drops task replicas 263 in case that there are more task replicas for one service than there are cores available. By 264 dropping these replicas we avoid that they block computing resources on a core for no When Algorithm 1 terminates, then each of the tasks in the taskset has been either 283 allocated to a core or has been dropped. The purpose of this allocation is to just assign 284 the tasks to a core. Later, within each core, as part of the utility optimisation, which is 285 the same as in ATMP [6], some tasks might be removed again from a core in order to 286 pass the schedulability test.

288
In this section we describe the ILP formulation to find the optimal task periods. We 289 describe the constants and variables of that ILP problem, the goal function to optimise 290 the system utility and the different constraints that have to be considered.

291
Optimisation parameters (constants): In ATMP the units of scheduling are tasks. As described in Section 3, each task τ of a task set T consists of the following components: y-intercept . . . q i = p tol,i − u tol,i · p prim,i p tol,i − p prim,i Optimisation variables: We use the following optimisation variables to find the opti-301 mised task configurations: 302 p i . . . the chosen period of task τ i , u i . . . the relative utility of task τ i ,

303
Objective function The optimisation ILP goal function maximises the system utility 304 through maximising the utility variable u i of each task τ i multiplied by its criticality 305 weight WT i : The criticality weight WT i is explained below at the optimisation constraints.

307
Optimisation constraints We express the piecewise affine approximations of the utility functions to the following constraints: Version April 27, 2021 submitted to Electronics 9 of 13 The resource constraints are used to limit the workload at each of the available cores 310 cr i ∈ Cores. The maximum workload a core cr can take is its computing capacity 311 Cap(cr): The tolerance constraints determine the maximal acceptable period of p i In ATMP, the weight WT i is always set to the criticality τ i .l of a task τ i . In contrast,

331
In the following we describe the setup and results of our experiments.

333
We have implemented an ATMP-CA scheduling simulator as described in Section 334 4. We configured the simulator to simulate a multi-core system with 10 cores, where 335 we simulate fault scenarios by making 10, 4, or 2 cores out of the 10 cores available.

336
This way we can simulate the resulting overall system utility for different cases of 337 resource shortage. Besides the ATMP-CA protocol, this simulator has also implemented 338 the ATMP and SAMP protocol for reference, as described in [6]. In essence, ATMP is 339 similar to ATMP-CA in the sense that it also does a utility optimisation, but its core ATMP and is just included for further reference. We then also implemented a protocol 345 SAMP-CA, which is basically the simple SAMP protocol, but using for core allocation 346 the new Algorithm 1, which is also using knowledge about criticality arithmetic. As 347 such, SAMP-CA might perform better for systems with criticality arithmetic than SAMP, 348 but it is not supposed to be able to compete with the utility optimisation performed by 349 ATMP-CA.  We have generated a taskset with random parameters for worst-case execution time 351 c and utility function uf . The implicit deadlines d are chosen to be equal to the primary 352 period p prim . The criticality of a task or service is either HI or LO, which corresponds to 353 a numeric value of either 2.0 or 1.0 respectively. We have constrained the task generation 354 such that it includes two normal HI services (S1, S2), two HI services that use criticality 355 arithmetic (S3,S4), and a few other LO services (S5, S6, S7, S8). The whole structure 356 of this taskset is shown in Table 2. As shown in the table the tasks T1 and T2, which 357 implement the HI services S1, S2 have the same criticality as the service itself. However, 358 the HI services S3 and S4, which use criticality arithmetic, are both implemented by two 359 redundant tasks T3 a , T3 b respectively T4 a , T4 b , which have all criticality LO. all. The effect of the criticality-arithmetic-aware generation of the ILP objective function 377 as described in Section 4.2 in this case for service S3. With ATMP-CA the first task T3 a 378 has been set to full utility, resulting in the degraded utility in T3 b , which allows to give 379 resources to other tasks. Since both tasks implement the same service S3, there is no need 380 to allocate both of them at maximum utility when the system experience an overload 381 as seen by the classical ATMP. The similar effect can be seen with service S4, where 382 ATMP-CA allows a degraded utility for task T4 a and then gives full utility to T4 b , while 383 with ATMP both tasks of S4 are degraded.
384 Figure 2.c shows the case with 2 cores out of 10 cores available. Here, SAMP 385 retained the tasks of HI-criticality services S1 and S2 and just one LO-criticality task 386 T8, but dropped all the tasks of all other HI-criticality services, including S3 and S4.  ones. In addition, ATMP-CA have allocated S3 replica task D with maximum utility as 397 result to the degradation of task C, and both S4 replicas been degraded which shows 398 that the modified optimisation process couldn't find a solution that allocate task F at 399 maximum utility. of resource limitations. Overall, ATMP-CA allows an even more smooth degradation 408 compared to ATMP in case of services using criticality arithmetic.

409
The limitation of the current experiments is that we only looked into systems with 410 two criticality levels. Future work would be to extend the method to multiple levels of 411 criticality.

413
In this paper we described the concept of criticality arithmetic, also known as SIL 414 arithmetic, which is a technique to reduce the required development effort of a service 415 by using task replication. The contribution of the paper is the development of ATMP-CA, 416 a mid-term scheduler that takes into account information about criticality arithmetic 417 to provide a graceful degradation of system utility in case of resource shortages, for