This section describes the performance evaluation of the real-time controllers designed in the previous section. We focus on three performance metrics that are critical to determine reliability and stability of each OES in real-time applications for distributed control. First, the periodicity and responsiveness of each task was evaluated to ascertain whether the system shows deterministic behavior and can perform expected tasks while satisfying hard temporal deadlines. Next, various task synchronization mechanisms were defined and evaluated to give a guideline of the expected overheads produced when they are applied in user applications. Finally, experiments were conducted to measure the interrupt response time for each OES. This determines the behavior of the system when interacting with device drivers for digital input and output devices. The experiment procedures and conditions for each metric is discussed in detail in the following subsections.
3.1. Periodicity and Responsiveness
Schedulability of real-time tasks is highly dependable on the timing correctness of each task; whether all the tasks can execute within their respective deadline. The periodicity and responsiveness of the system is verified using the method called response-time analysis [
35]. According to this method, schedulability of set of tasks can be analyzed according to worst case response time. The response time is defined as the duration in which a task starts its execution from a release point until it finishes its job. The behavior of the execution of a real-time task is illustrated in
Figure 2.
In the figure, for task τ
i, timing characteristics are defined with their respective priorities, activation period (
P), which is usually equal to the relative deadline where execution time is defined as computation time (
C). The release jitter (
J) which is the delay of execution at the beginning of the task due to context switching. The busy period (
W) of the task which is the sum of computation time (
C), the blocking time (
B), and interference time (
I). A blocking happens when a low priority task owns a resource needed by a high priority task. Whereas, an interference occurs when lower priority tasks are preempted by tasks with higher priorities [
8].
In this equation,
Ii denotes the sum of the computation time of all the elements included in the set of tasks with higher priority than task τ
i,
hp(
i). If ∃τ
hp(
i), Equation (1) is iterated
x number of times until
. The test should be stopped once the current iteration yields a value of beyond the deadline else, it would be impossible to terminate. This is only applicable to determine the busy period of the current task, τ
i, and should be repeated when needed for the other tasks. After the busy period is calculated, the overall response time of the system is determined by the following equation [
8]:
Periodic tasks are schedulable if and only if all the scheduled tasks can complete their execution of the given computation time within the respective period/deadline. To demonstrate a practical example of the response-time test, we performed a simplified experiment consisting of two real-time tasks. The goal of the experiment is to verify whether the real-time controllers designed in the previous section can show deterministic behavior accordingly with the response time test. The experiment conditions are made as simple as possible for easier understanding and lesser calculations. Two tasks were generated with the given priority, period, and computation time. The first task, τ
1, has a priority of 99, computation time of 0.5 ms, and period of 1 ms. The lower priority task, τ
2, was generated with the priority of 80, computation time of 1.5 ms, and period of 5 ms. Note that according to the Xenomai documentation, the highest priority level is 99 and the lowest is 1. The tasks are scheduled to start approximately at the same time. To ensure that both tasks can fulfill the configured computation time, we have used the function
rt_
timer_
spin(SPINTIME) which is available in the Xenomai user space library. This function burns the CPU in the specified SPINTIME, given as an argument in nanoseconds. The expected behavior of the tasks is shown in
Figure 3. In this figure, τ
1 and τ
2 are represented by the blue and red lines, respectively. The interference is represented by the box with blue diagonal lines.
It is important to notice that τ
2 is preempted by the higher priority task, thus there are points of interference during its busy period. For visibility purposes, the
rt_
timer_
spin() function is encapsulated in a loop with the SPINTIME of 0.1 ms. The loop will terminate when the accumulated SPINTIME is equal to the configured computation time of 1.5 ms. This means that τ
2 completes its job when fifteen blocks of 0.1 ms is executed. The interference time is represented by the boxes with blue diagonal lines. Additionally, it is conspicuous that both tasks run periodically as shown in the values of the data cursor located at the release points. According to Equations (1) and (2), the expected response time for τ
1 is 0.5 ms because it has the highest priority and no block or interference occurs during its busy period. On the other hand, τ
2 will complete its execution every 3 ms. The calculation for the response time is presented in the
Appendix A. The calculated response times are also verified in accordance to the schedulability analysis tool, Model Analysis Suite for Real-time Applications (MAST) [
36]. Herein, the default toolset was selected to calculate the worst-case behavior of the system and whether it can always meet the hard real-time deadline. We have enabled
slack calculations, or the percentage by which the execution time can be increased while maintaining schedulability. The slack is a vital information in determining whether how close the system is becoming schedulable or non-schedulable. The results of the MAST analysis are shown in
Table 2. The worst-case response time for each task is equal to the results of the calculation using Equations (1) and (2) and both tasks are schedulable with 24.61% system slack.
The experiments were performed on each real-time embedded controller for 10 minutes to verify whether they can show schedulability in comparison to the values above. During the experiment, the OESes are kept isolated to avoid any unwanted interruptions that could affect the performance of the entire system. For this reason, all the measured values are stored in a buffer for offline processing and analysis.
The results of the timing analysis are shown in
Table 3 with the statistical average (avg), maximum (max), minimum (min), and standard deviation (σ) values of each timing metric. Analyzing these results, we could see that the measured average response time for all OESes were approximately equal to the expected response time of 0.5 and 3 ms for τ
1 and τ
2, respectively. Moreover, we can conclude that they were able to meet their respective deadlines producing promising results of the average period for both tasks. The σ shows that the Raspberry Pi 3 has the best performance with the lowest deviation to the statistical average. Although not reported in this paper, this difference can be accountable to the improved architecture of Xenomai 3.
As the objective of this paper is to prove the viability of OES in real-time applications, the difference in performance with respect to the Xenomai version is neglected as long as the OES were able to fulfill the requirements of a hard real-time control system. These results indicate that the designed real-time controllers based on OES is feasible for hard real-time applications.
3.2. Task Synchronization Mechanisms
Aside from periodicity and responsiveness of real-time tasks, correctness of the data being processed is another concern to ensure deterministic behavior of a real-time control system. Notably in a DCS, various devices are required to exchange data in a multitasking environment. These tasks are expected to execute in parallel and often need to access the same resources. However, asynchronization and concurrency issues between them can cause either data overflow (when the publisher is running faster than the reader), or data loss (when the publisher is slower). RTOSes offer inter-task communication (ITC) mechanisms to prevent such anomaly. ITCs are characterized into two main types: Shared memory protection and message-passing mechanisms. In a shared memory, different tasks can publish or read data stored in a region of memory. Mechanisms such as semaphores and mutexes prevent simultaneous access of that region, thus only one task can access the shared data and avoid the asynchronization issues mentioned above. In case of the message-passing, one tasks acts as the sender responsible for transmitting a specific data to the reader.
The reader continuously waits for the message from the sender and will not execute until it receives the entire message completely. In this paper, ITC mechanisms are evaluated to serve as a guideline and make developers aware of the amount of overhead when applying these mechanisms to user space applications. This is very helpful in particular to real-time applications in an embedded environment, where optimal size of the user code is required to save memory space and to efficiently predict the total task execution time.
3.2.1. Semaphore and Mutex
Semaphores are very useful in the synchronization of multiple tasks when communicating with shared data structures. As all tasks in the same process exist in the same address space, sharing data structures between tasks are vulnerable to data corruption. A semaphore gives exclusive access to the shared resources unto the task that possesses it. Other tasks requesting for the semaphore are suspended until the current owner releases it. In Xenomai, semaphores are counting semaphores that can allow
N number of tasks to access the shared resources simultaneously. On the other hand, mutexes (MUTual Exclusion) are binary semaphores that can only have two values: unlocked or locked. In the locked state, the task in possession can access the shared resources and the other tasks should wait. Whereas in the unlocked state, the critical section is free for the other tasks to access and to acquire the mutex. Another feature of the mutex in Xenomai is that it enforces a priority inheritance protocol in order to solve priority inversions, a problem in scheduling real-time tasks where lower priority tasks preempts higher priority tasks. To measure the overhead produced by these mechanisms, the experiment condition in the previous section were reformulated including semaphores and mutexes as shown in the pseudo code in
Figure 4.
Basically, the response time with either semaphore or mutex (
RSem|Mtx) is equal to the response time in without any of the ITC mechanisms and the overhead. Therefore, the time duration for acquiring and releasing the semaphore or mutex is calculated using the following equation:
where,
R0 denotes the response time measured in the previous section,
TSem|Mtx denotes the overhead for the mutex or the semaphore.
Considering the same conditions, the experiments were conducted for each OES. Although the Xenomai semaphore is a counting semaphore, we decided to configure it behaving similar with a mutex that only has two values. This is to make a more legit comparison of both mechanisms. Moreover, we focus on acquiring the results from the higher priority task with straightforward implementation in order to neglect external factors that can contribute to the measurement. The semaphore/mutex operations in the low priority task is within a loop, which can produce unwanted computational delays. The results are summarized in
Table 4 showing the statistical average of the response times and the time duration of each ITC mechanism. Herein, we can clearly see that the mutex has larger overhead than the semaphore, which is consistent for all the embedded platforms. We assume that this is due to the mutex having more features such as blocking interrupts and the priority inheritance scheme.
3.2.2. Message Queue
The message queue is very useful in sending data between real-time tasks. The message is sent from either interrupt service routines or tasks to another task. Centralization of a specific function, such as error handling, is the common application of message queues. If a task is waiting for a message and the queue is empty, then the task will be suspended until a message is posted in the queue. Meaning, the waiting task does not consume any CPU time while waiting for a message thus, other tasks can run continuously. The goal of the experiments is to measure the total time duration for the receiver task to be activated, which is shown in the pseudo code in
Figure 5.
The total time duration of from the sender task posting a message to the queue, until the receiver task receives the message and gets activated, denoted as
TMsgq, is calculated using the following equation:
In here,
Tctx denotes the context switching time, or the time it takes for the CPU to save the context (state) of the current task, restore and execute the context of the next scheduled task. In this paper we assume that the context switching time is very small that it is neglectable. For a practical implementation of message queues, we consider the pseudo code in
Figure 5. The high priority task is set periodically for 1 ms. Note that, the receiver task depends on the periodicity of the sender task. Meaning, the period of the receiver should be equal to the period of the task posting the message. The sender task posts a dummy message to the queue and the receiver task waits for the message before doing its execution. The results are summarized in
Table 5 with the statistical average (avg), maximum (max), minimum (min), and standard deviation (σ) values of
TMsgq and the periodicity of the two tasks.
As expected, the periodicity (P) of the receiver task (τ
2) highly depends on that of the sender task (τ
1). This is evident in all the measured data from each OES. The trend of the periodicity for both tasks is also consistent with the results from
Section 3.1 with the Raspberry Pi 3 showing the best performance and BeagleBone Black has the relatively worst results. Moreover, most of the OES produced similar statistical average of
TMsgq, BeagleBone Black has the worst average with 18.748 μs. The same trend is visible in the standard deviation where the BeagleBone Black produced very high value of 3.805 μs in comparison to the other OES that show deviations of less than 1 μs. We account this to the single core architecture of the BeagleBone Black whereas, all other OES systems are attached with multiple CPU cores.
3.3. Interrupt Response Time
As DCS are composed of different devices to interact with the environment, it is very important to measure the interrupt response time of the main controller. The interrupt response time is defined as the elapse time between an interrupt signal and the corresponding interrupt service routine. In a Xenomai environment, device drivers should be created in order to interact with the connected devices. However, most device drivers are only available to the standard Linux. Although it is possible to use these device drivers inside Xenomai tasks, it is not highly recommended because the event called mode switching could occur. Mode switching causes Xenomai tasks to be scheduled in the standard Linux scheduler, thus losing its real-time capabilities. To this end, Xenomai offers a RTDM to develop device drivers without suffering from the issues of mode switching. Using RTDM, we can expect that the interrupt response time would be lower than that of the standard Linux because of priority-based scheduling of Xenomai.
Comparative experiment measuring the interrupt response time was conducted by creating RTDM device drivers and standard Linux device driver that handle two general-purpose input and output (GPIO) ports. To gather accurate results, we used a function generator to generate square-wave signals that will be connected to the input port of the OES. An oscilloscope was used for data acquisition and determine the skew between the reference signal and the device driver output. The first port of the GPIO is configured as the input, connected to a square-wave function generator. The other port is configured as the digital output. The interrupt service routine is kept as simple as possible by acquiring the value of the input port and sending it directly to the output port. The input port is probed by the oscilloscope which becomes the reference signal. Another probe is placed on the output port. The time difference (skew) between the ports is the interrupt response time. The same procedures were implemented using the RTDM and standard Linux device drivers. The experiments were conducted for 10 minutes and the statistical measurements were acquired from the oscilloscope. The actual results for all the OES are shown in
Figure 6. In the figure, the standard Linux device driver has shown interrupt response time that is four times at most than that of the RTDM. The average interrupt response time for RTDM ranges from 5.22 μs to 8.01 μs, with the Zybo-7020 showing the fastest response. The same experiments were repeated a few times producing results with the same trend. These promising results will serve as a good measure for developers willing to integrate various devices to Xenomai-based real-time controllers. Especially, with devices that requires fast response times, RTDM device drivers can minimize the interrupt response time, guaranteeing priority-based scheduling.
Let us again consider the two-task application described in
Figure 4, the output GPIO is toggled to visualize the jobs.
Figure 7 shows the actual behavior of the tasks observed using an oscilloscope for all the designed real-time controllers. The high priority task (τ
1) runs periodically with an average of 1 ms for all the embedded controllers. The lower priority task (τ
2) at the bottom part of the plot also maintains periodicity running with an average of 5 ms. This shows that the minimal interrupt response time produced by the RTDM driver does not affect the periodicity of the real-time tasks. As expected, Raspberry Pi 3 shows the best performance with standard deviation of 1.938 μs and 443.6 ns for τ
1 and τ
2, respectively.