Specific Electronic Platform to Test the Influence of Hypervisors on the Performance of Embedded Systems

Some complex digital circuits must host various operating systems in a single electronic platform to make real-time and not-real-time tasks compatible or assign different priorities to current applications. For this purpose, some hardware–software techniques—called virtualization—must be integrated to run the operating systems independently, as isolated in different processors: virtual machines. These are monitored and managed by a software tool named hypervisor, which is in charge of allowing each operating system to take control of the hardware resources. Therefore, the hypervisor determines the effectiveness of the system when reacting to events. To measure, estimate or compare the performance of different ways to configure the virtualization, our research team has designed and implemented a specific testbench: an electronic system, based on a complex System on Chip with a processing system and programmable logic, to configure the hardware–software partition and show merit figures, to evaluate the performance of the different options, a field that has received insufficient attention so far. In this way, the fabric of the Field Programmable Gate Array (FPGA) can be exploited for measurements and instrumentation. The platform has been validated with two hypervisors, Xen and Jailhouse, in a multiprocessor System-on-Chip, by executing real-time operating systems and application programs in different contexts.


Introduction
Many complex digital circuits must host different operating systems in a single electronic platform to synchronize real-time and deferrable tasks or give different priorities to running applications. Applications around Industry 4.0 or the Industrial Internet of Things require incorporating more non-real-time functionality and common software architectures such as RTOperating System (OS) and bare-metal-based periodic control loops. Hence, some hardware-software codesign techniques-named virtualization-must be exploited to run all the operating systems autonomously of each other, as confined in different processors or virtual machines.
There are different levels of virtualization or virtualization schemes, full virtualization, para-virtualization, and static partitioning or core virtualization [1]. Due to the strict constraints on resources and the computing power of embedded environments, not all are suitable for industrial embedded systems. Considering the prioritization of real-time operations and the restriction of computing resources in embedded systems, static partitioning is more suitable for them. In this case, because of the straightway coupling between the physical resources and the virtualized environments, the real-time operation determinism is less affected. So, partitioning is better suited to embedded resources than full virtualization [2].
These virtualization schemes require an underlying software-named the Virtual Machine Monitor (VMM) or the so-called hypervisor-to handle the guest machines or guest OS. The hypervisor runs at a privileged execution level, and with the help of virtual systems, it manages the sharing of underlying physical resources. However, at the same time, the hypervisor introduces an additional layer between the outside world and the processing unit, i.e., a potentially longer latency. The common use cases for hypervisors in embedded systems are the consolidation, legacy operating systems and multiple security levels [3].
A type 1 hypervisor runs directly on the hardware. It hosts OS and manages resources and memory allocation for the virtual machines. In a type 2 hypervisor or guest OS virtualization, the hypervisor runs over a host OS, the lowest layer of software, which provides drivers and services for the hypervisor hosting virtualized guest OS. Comparing both types of hypervisors, type 1 introduces less overhead, providing a better performance and greater capacity to optimize the allocation of hardware resources to the various virtual machines. Access to the devices does not depend on the driver being virtualized compared to the type 2 hypervisors.
Consequently, as an interface, the hypervisor may determine the effectiveness of the system when reacting to events, depending not only on how quickly it switches the working context but also on which hardware resources it occupies or releases. Nevertheless, measuring, estimating, or comparing the performance when configuring the virtualization in different ways is not trivial. Specific electronic testbenches to host various processors, install distinct operating systems, configure the hardware-software partition and show significant merit figures are demanded.
To characterize and compare multiple virtualization interferences, the authors have designed a specific hardware-testbench. The programmed application measures the additional latency introduced in an interrupt service during normal execution, and it is regarded as fairly general and of capital importance in industrial applications. In many such systems, for example, smart grid applications, the time to react in front of a stimulus is the key factor of the equipment [4]. In such a system, a baremetal application provides the greatest speed with the smallest flexibility. Nevertheless, a compromise point may be the division of work by the use of a hypervisor. It provides, at the same time:

•
Flexibility of a higher level operating system; • Careful control of latencies of a baremetal application; • Security by separation of hardware and software between both elements.
Following the structure of this article, Section 2 presents a brief survey of related work about other testbench setups. Section 3 specifies the considered electronic platform for testing different hypervisors. The proposed approach for an automated test procedure is discussed in Section 4, and the paper ends with some conclusions.

Hardware-Testbenches for Hypervisors
So far, the performance of hypervisors has been mostly tested in computer platforms; sometimes of the personal type [5][6][7][8][9][10][11][12][13][14][15], others in servers [16][17][18][19][20][21][22][23][24][25][26]; multi-purpose embedded systems have also been used in the same way [1,27,28]. Although we could have exploited these complex testbenches, and all the advantages of high-level software interfaces, sensors, and electronic instrumentation in general, would have had to be attached outside. Besides, reconfiguring the setup implied manual changes, and external connections and instruments introduced errors in the latencies' measurements. Reference [29] used an FPGA platform in a similar context; not to test how the hypervisor slowed down the performance of the processor system, like us, but to analyze the interference between virtual machines in their working-isolation. Table 1 shows in which electronic cards the referenced teams performed the tests.  [29] Although the authors of [29] conducted some research on CPU performance in an FPGA card with a chip similar to ours-Xilinx ZCU102-they were not interested in the interference of the hypervisor on latencies but in using Jailhouse as the tool to isolate different virtual machines-so that they do not disturb each other's resources.
Bansal et al. [30] also refer to performance in a Xilinx ZCU102, but, in this case, the authors propose Jailhouse at the end as the solution to the problems found about memory contentions: isolating every subsystem using the hypervisor. Hence, to the best of our knowledge, this present work is the first to propose a hardware configurable testbench for analyzing the latency introduced by a hypervisor, exploiting, at the same time, the reconfigurable part of the FPGA to create the instrumentation.

The Electronic Platform
The main contribution of this work is designing and implementing a hardware platform valid for testing flexibly and automatically characterizing different hypervisors in various working configurations. Our objective is to offer a versatile electronic tool oriented to evaluate the influence on the system's response due to the use of hypervisors. These are the main criteria in order to select the hardware platform: • Two of the most used hypervisors, Xen and Jailhouse, have been selected as the reference for the design requirements; they maintain a list of hardware systems on which they have ever been tested. The Xen list is longer because it is an older alternative. For the sake of generality, the selected hardware should appear in both lists. If it does not, the adequacy of it should be considered feasible; • To test the virtualized systems, tools to generate the Linux cell are demanded, with its kernel, device-tree and file-system, and the baremetal cell. The selected solution must provide this software, and the availability of the source code is also valuable, in case modifications were needed.
An Advanced RISC Machines (ARM)-based architecture has been chosen because it is the most used one for embedded systems, and it is present in the Processing System (PS) of Zynq and Zynq UltraScale+ families in Xilinx devices. It is also suitable for the task since ARMv8 architecture provides virtualization extensions in hardware.
The Xilinx Zynq UltraScale+ family has been chosen because it includes ARMv8 architecture in PS and a Programmable Logic (PL) part where application-specific circuits can be implemented in order to create a test setup. The Xilinx ZCU102 [31] board, which includes a Zynq UltraScale+ Multiprocessor Programmable System-on-a-Chip (MPSoC) chip, appears in Xen and Jailhouse lists. It would have been a good option, but the research team has been using the UltraZed [32] board for the last works, and it was concluded that, as it includes a chip of the same family, slight modifications would be needed in order to execute both hypervisors so that this electronic platform has been chosen. When preparing this work, the UltraZed card was not among Jailhouse's list of supported and tested platforms; there was no cell file for it, so the files corresponding to the root cell and the cell with the baremetal application were created. There is a configuration file for every cell (in our case, one for the root cell and one for the baremetal one). These files created for the UltraZed contain the configuration structure that Jailhouse needs to locate the different resources of the platform on the memory map. Making possible the use of the Ultrazed board for Jailhouse is another valuable contribution of this work. The four ARM Cortex-A53 (ARMv8) cores support the processing of both guests, the Linux system, and the baremetal. The PL part is used as general IO and to measure the timing of the system. Figure 1 shows the block diagram, as it is designed in Vivado. It is a straightforward system in which all the blocks are connected by the Advanced eXtensible Interface (AXI) bus and appear mapped in microprocessor memory. The objective is to measure the latency that hypervisors introduce in treating interrupts in guest baremetal systems. For this purpose, an instrumentation subsystem has been created; to do that, the following blocks and connections between them have been arranged: • axi_gpio_0 [33]: generates an interrupt whenever a change occurs in its input. The output is connected as an interrupt generated in PL and used as input in PS; • axi_timer_1 [34]: it is configured as Pulse-width modulation (PWM), and its output is the input signal in axi_gpio_0; • axi_gpio_3: its output signal is activated from the interrupt service routine; • axi_timer_0: it captures two events: the interrupt event generated by axi_gpio_0 and the activation of axi_gpio_3, which occurs when the interrupt is attended. So the difference between both times is the time needed to attend the interrupt; • axi_gpio_1: its output is connected to a LED in order to visually follow the execution.
The interrupt events are also monitored in an external logic analyzer using a PMOD connector as the backup instrumentation.

Test Procedure
The main purpose of the hardware described in the previous section is to provide a flexible and configurable means to accurately and transparently measure the latency in executing an interrupt service function. In order to be explored by different test cases, the following steps have been automated: • Create a base application-level operating system. In our case, Linux. For that purpose, we require: -Create a kernel with hypervisor supports; -Add the hypervisor executable and configuration; • Create a base low latency application. In our case, a baremetal application has been created for the interrupt service function. In a more general way, a real-time application could be used, both baremetal or RTOS; • Create a periodic interrupt by the use of the hardware timer; • Capture the time to serve the interrupt by the use of the hardware capture module; • Repeat the operation for a statistically significant number of times to obtain the data under different CPU loads. The hypervisor's operating system is stressed to test the impact of the hypervisor in the high-priority baremetal application.
It must be stated that the test must be repeated to filter out outlying values. Simultaneously, these outliers are of great importance since, in real-time systems, the worst-case scenario is the defining scenario. An overall structure can be seen in Figure 2.

General Flow
To test different hypervisors, a procedure has been devised ( Figure 3). The hardware can generate continuous interrupts. What is left is to measure the time to service those interrupts under different scenarios. To do that, an Interrupt Service Routine (ISR) has been created as a baremetal application. This ISR toggles the stop signal of the capture counter through a GPIO. A program inside the Linux cell/Dom reads the start and stop times, saving them into a file. This ISR resides in a baremetal application so that it does not depend on any operating system. To generate different scenarios, the stress_ng tool is used inside the Linux cell/Dom. The output values are transmitted to a computer and analyzed.  The test procedure has been automated to a great degree. A script generates the different stress levels (in our case, CPU stress, memory stress, and CPU and memory stress), reads the ISR latency, and generates a result file for every stress test. Each file contains several hundred instances of the interrupt (the number can be configured to get a statistically significant number of instances); this file is transmitted using sftp to a host PC where the data are analyzed and graphs generated using an Octave/Gnuplot script.

GPIO
All these scripts are highly configurable, and stress_ng has many different options. Depending on the final application, different stress patterns can be generated. We have selected two common baseline scenarios, although personalized ones could be as easily created.
This automation is useful since it can be exploited as an in-system self-test. Instead of using artificial stress software, the actual final application can be used while in the field to measure possible problems.

Validation Test
In order to validate our setup, we have tested two popular hypervisors: Xen and Jailhouse. At the same time, as mentioned, we have stressed the operating system managing the hypervisor. The stress has been focused on CPU and memory usage.
For example, the platform automatically generated 700 rounds of the interrupt in three different scenarios and two different hypervisors-a sample can be seen in Figures 4 and 5.
The first experiment (see Figure 4) has been performed, stressing the CPU; the second (see Figure 5), the virtual memory.   The tool also sources a summary of statistical data-for example, see Table 2. In the optimal case, a latency of approximately 1 µs for Jailhouse and 2 µs for Xen were found.

Conclusions
In this work, a flexible, configurable, and automated platform composed of an electronic system based on a complex FPGA has been proposed and validated for measuring and evaluating the performance of different hypervisors.
First, we investigated the impact of the hypervisor's performance. The integration of virtualization techniques allows the coexistence of real-time and non-real-time applications running in different operating systems inside a single digital circuit. In this virtualized context, the hypervisor is responsible for monitoring and managing the independent operating systems, deciding which takes control of hardware resources in each moment. Therefore, the effectiveness of the system in the presence of interrupts or events is connected to the hypervisor behavior straightaway. We have noticed that the selected platforms for previous studies in most cases are computer ones, but they did not take advantage of reconfigurable computing solutions technologies. Besides, they do not analyze the I/O contention, such as general-purpose I/O interrupts.
A hardware testbench composed of an SoC that hosts some processors, installs various operating systems, configures the hardware-software partition, and shows significant merit figures has been designed and implemented. A test procedure has been defined and automated to a great degree to measure the latency in the execution of an interrupt service function, and the setup has been validated using two type 1 hypervisors: Xen and Jailhouse. Funding: This work has been supported by the Basque Government within the project HAZITEK ZE-2020/00022 as well as the Ministerio de Ciencia e Innovación of Spain through the Centro para el Desarrollo Tecnológico Industrial (CDTI) within the project IDI-20201264 and FEDER funds.

Conflicts of Interest:
The authors declare no conflict of interest.