seL4 Microkernel for virtualization use-cases: Potential directions towards a standard VMM

Virtualization plays an essential role in providing security to computational systems by isolating execution environments. Many software solutions, called hypervisors, have been proposed to provide virtualization capabilities. However, only a few were designed for being deployed at the edge of the network, in devices with fewer computation resources when compared with servers in the Cloud. Among the few lightweight software that can play the hypervisor role, seL4 stands out by providing a small Trusted Computing Base and formally verified components, enhancing its security. Despite today being more than a decade with seL4 microkernel technology, its existing userland and tools are still scarce and not very mature. Over the last few years, the main effort has been put into increasing the maturity of the kernel itself and not the tools and applications that can be hosted on top. Therefore, it currently lacks proper support for a full-featured userland Virtual Machine Monitor, and the existing one is quite fragmented. This article discusses the potential directions to a standard VMM by presenting our view of design principles and feature set needed. This article does not intend to define a standard VMM, we intend to instigate this discussion through the seL4 community.


Introduction
An in-depth study by Transparency Market Research has found that embedded systems in IoT have witnessed incredible progress. The study has projected the embedded system market to advance at a CAGR (Compound Annual Growth Rate) of 7.7% from 2022 to 2031 [1]. Moreover, the growing trend of virtualization for embedded systems in the IT sector is a major force for the expansion of avenues in the embedded system market. Virtualization stands out as an approach to providing portability and security to computational systems. It has become popular as a solution for high-powered machines, as servers in the Cloud Computing paradigm. However, in recent years, it has been popularizing the use of virtualization at the Edge of the network in embedded devices [2]. Virtualization makes it possible to have different and isolated virtual machines (i.e., execution environments) in the same platform, providing security by separation, in which one environment does not have access to the resources of a neighbor environment. The hypervisor is the piece of software that creates and runs the virtual machines in a system [3]. There is a variety of flavors for hypervisors available nowadays.
The Protected KVM [4] and the Dom0-less Xen [5] are two recent open-source initiatives that have been trying to improve the security of mainstream hypervisors like KVM (Kernelbased Virtual Machine) 1 and Xen 2 , respectively. However, they are sacrificing the performance or the feature set for which those hypervisors were originally designed. Microkernel-based designs like seL4 rapidly became revolutionary since they bring security from the ground-up, • This article does not intend to define a standard VMM. It intends to present the potential key design principles and feature set support toward seL4 VMM standardization. The items shown in this paper can be the basis for an extended version with a more comprehensive list of required properties and features.
The contributions of this article can be summarized as (i) to present a discussion on the seL4 microkernel being used as a hypervisor, and its comparison with other traditional hypervisors, (ii), to present potential directions for seL4 VMM development and adoption by the community, and (iii) to discuss the next steps on seL4 virtualization apart from the VMM development, as the use of an API to connect with different VMMs and the next steps of formal verification in this spectrum.
The rest of this article is presented as follows. Section 2 introduces seL4 and presents its characteristics. Section 3 presents the background definitions surrounding the virtualization and seL4 topics. Section 4 presents the Design Principals and Feature Support Tenets towards a standardized seL4 VMM. Section 5 presents discussion topics that should be considered towards a seL4 standard VMM. Finally, Section 6 concludes the article.

Why seL4?
seL4 is a member of the L4 family of microkernels that goes back to the mid-1990s [7] [18]. It uses capabilities, which allows fine-grained access controls and strong isolation guarantees. The Trusted Computing Base, or TCB, with seL4 is small with 9-18k SLOC (source lines of code), depending on CPU architecture, and it was the first general purpose OS to be formally verified. seL4 also features very fast IPC (Inter-Process Communication) performance -something that is very important for microkernels. According to seL4 FAQ [19], it is the fastest microkernel in a cross-address-space message-passing (IPC) operation.
Many of the hypervisors have as their main strengths other aspects than security. This impacts the architecture (e.g. monolithic) and design decisions. In this regard seL4 with it's fine grained access model and strong isolation guarantees outperform others. The formal verification further adds proof and credibility and makes it even more unique. Thus, seL4 has a solid security model and story, backed by formal verification.
The seL4 is a general-purpose microkernel with proven real-time capabilities that provides system architectural flexibility. The security and safety critical components can be run natively in the user space of seL4 hypervisor. This also applies to the components with real-time requirements. The strong spatial and temporal isolation guarantees that the system components -and untrusted VMs -are unable to interfere with each other.
seL4 is used and being developed by a growing number of companies and hobbyists, with only a few hypervisors, as KVM and Xen, outperforming seL4 in this regard. Most of the Open Source Hypervisors (OSS) have a small engaged community, and/or the development solely depend on the interest of a single individual or company. Community is one of the most important aspects for a successful open source operating system, and hypervisor. Moreover, there are hypervisors being developed by a single company. In this case, the development takes place in private repositories, and only the selected features are published as snapshots to public repositories. The dominance of a single company makes these projects unattractive for other companies. This, for example, hinders the development of architecture as well as hardware support in general. In seL4 environment, the seL4 foundation 9 [20] ensures neutrality, and all seL4 development takes place on public repositories.

Background
This Section presents various concepts that are used during the article. It starts from the definition and comparison between a microkernel and a monolithic kernel. It discusses the definition and disambiguation of the terms Virtual Machine and Virtual Machine Monitor under the umbrella of virtualization. It discusses different hypervisors approaches and its VMMs. Finally, it discusses the VMM definition under seL4 environment and a variety of its components on the virtualization side, as for instance CAmkES 10 (Component Architecture for Microkernel-based Embedded Systems) and Core Platform 11 .

Microkernel & Monolithic kernel
The kernel is the indispensable and therefore most important part of an operating system [21]. An operating system will consist in two main parts: (i) kernel space, and (ii) user space. The kernel space runs in a higher privilege level than the user space. Any code executing in privileged mode can bypass security, and is therefore inherently part of a system's trusted computing base (TCB) [22].
There are two different concepts of kernels: monolithic kernel and microkernel. The monolithic kernel runs a every basic system service in kernel space level of privilege. Examples of those basic system services are process and memory management, interrupt handling and I/O communication, and file system [21] [23]. The kernel size, lack of extensibility, and poor maintainability are the three main disadvantages of placing all fundamental services in the kernel space. The microkernel was created with the idea of reduce the kernel to basic process communication and I/O control, and let the other system services reside in user space in form of normal processes [21]. This approach reduces the TCB of the kernel itself, thus reducing the attack surface [24].
Although monolithic kernels tend to be more generic and easy to use than microkernels, its large TCB increases the vulnerabilities at the kernel space. As an example, there are about 22.7M SLOC for the whole Linux kernel but 16.4M SLOC (71.9%) of them are device drivers [25]. On the other hand, as an example, seL4 microkernel has around 9-18k SLOC [19]. It is easier to ensure the correctness of a small kernel, than a big one. That way, stability issues are simpler to solve with that approach [21]. Moreover, It has been argued that the microkernel design, with its ability to reduce TCB size, contain faults and encapsulate untrusted components, is, in terms of security, superior to monolithic systems [24] [26].

Virtualization
Virtualization is a technique that allows several operating systems to run side-by-side on given hardware [27] [28]. Virtualization brings different kinds of benefits to the environment that it is deployed. One of the benefits would be the heterogeneity that it can bring, being possible to deploy various operating systems and applications in the same hardware [29]. Moreover, it improves the system's security by achieving security by separation [30] [31]. It is achieved as each operating system has its own space, not having an explicit connection with others, keeping software instances isolated. Nevertheless, virtualization requires a software layer responsible for system management, known as a hypervisor.
The hypervisor is a software layer responsible for managing the hardware and explicitly making it available to the upper layers [32]. It has privileged access to the hardware resources and can allocate it accordingly to the operating systems. Examples of hardware resources or devices are: storage memory, network device, I/O devices, etc. For security reasons, the hardware should not be shared directly by different operating systems. However, the hypervisor can provide virtual copies of the same hardware to other operating systems [33]. Many computer architectures have specific privilege levels to run the hypervisor, such as EL2 on ARM and HS-mode on RISC-V. Examples of hypervisors are Xen, KVM, ACRN 12 , Bao 13 , and seL4.
The hypervisors can be categorized into type-1 and type-2. The type-1 hypervisors runs on bare metal (i.e., directly on the host machine's physical hardware) and type-2 hypervisors, also called hosted hypervisors, runs on top of an operating system [34]. The type-1 hypervisors are considered more secure by not relying on a host operating system. KVM is an example of type-2 hypervisor by running on Linux kernel while seL4 is an example of type-1 hypervisor.
The Figure 1 presents a high-level overview of the components present in a virtualization environment considering seL4 hypervisor: hardware resources or devices, hypervisor, Virtual Machine Monitor, and Virtual Machine. The seL4 microkernel when used as a type-1 hypervisor, provides only basic functionality (i.e., memory management, scheduling tasks, basic IPC), pushing more complex functionalities, as device drivers, to the upper layers. Type-2 hypervisors will have a larger code base with more complex functionalities embedded into it [24]. The Virtual Machine Monitor (VMM) is a piece of software that interacts with the hypervisor in the virtualization environment. It has its own responsibilities apart from the hypervisor. The VMM is a user space program that provides emulation for virtual devices and control mechanisms to manage VM Guests (virtual machines) [35]. The VMM enables the virtualization layer to create, manage, and govern operating systems [36]. By running at the user space, the VMM runs at privilege level EL0 on ARM and U-mode on RISC-V. Examples of VMMs are Firecracker 14 , crosvm 15 , QEMU 16 . Depending on the hypervisor and on the characteristics of the deployed environment, it is possible to have one or multiple VMMs. A common approach is to have one VMM per each operating system Virtual Machine.
Each operating system sits inside a Virtual Machine (VM). A VM behaves like an actual operating system from the point of view of the user, being possible to run applications and interact with it [37]. From the point of view of the hypervisor, a VM has access to a specific set of hardware resources managed by the hypervisor. It is the VMM that makes the bridge from the hardware resources of the hypervisor to make them available to the VM by managing the backend operations [38]. From the scalability perspective, it is possible to have multiple VMs in a virtualization environment, where each VM is isolated from the other by principle. 6 of 20 The quantity of VMs depends on the amount of physical resources available for such an environment.

Related hypervisors and VMMs
Apart from seL4, there are other open source hypervisors available in the market. KVM and Xen are examples of traditional hypervisors that have been in the market for more than 15 years and were deployed in different solutions [39] [40]. While both hypervisors are widely used, feature rich and well supported, the huge TCB makes them vulnerable.
KVM is a type-2 hypervisor that added virtualization capabilities to Linux. KVM is integrated in the Linux kernel, thus benefiting from reusing many Linux functionalities such as memory management and CPU scheduling. The downside of it is the huge TCB that comes along KVM. The KVM was originally built for x86 architecture and then ported to ARM [41]. The KVM on ARM implementation has been split in the so-called Highvisor and Lowvisor. The Highvisor lies in ARMs kernel space (EL1) and handles most of the hypervisor functionalities. The Lowvisor resides in hypervisor mode (EL2) and is responsible for enforcing isolation, handling hypervisor traps and performing the world switches (context execution switches between VMs and host) [42].
Xen is defined as a type-1 hypervisor. The x86 version of Xen, is a bare-metal hypervisor that supports both fully virtualized and para-virtualized guests. On ARM, the code for Xen is reduced to one type of guest which uses para-virtualized drivers and the ARM virtualization extensions [42]. The Xen hypervisor resides in hypervisor mode. On top of it, everything is executed as a guest placed in different domains. The most privileged domain is called Dom0, it has access to hardware and runs Linux to manage other guests, named DomU 17 . A DomU is the counterpart to Dom0; it is an unprivileged domain with (by default) no access to the hardware. DomU's use Dom0's para-virtualized services through Xen PV calls. Recently, the Dom0-less variant was introduced. With Dom0-less 18 , Xen boots selected VMs in parallel on different physical CPU cores directly from the hypervisor at boot time. Xen Dom0-less is a natural fit for static partitioning, where a user splits the platform into multiple isolated domains and runs different operating systems on each domain.
Traditionally, KVM and Xen hypervisors were designed to be deployed at the Cloud Computing level to provide virtualization to high-density machines. However, recent solutions were developed to use those kinds of hypervisors also at the Edge level [43] [44], being able to have such virtualization solutions in devices with less processing power than the servers at the Cloud [45] [46] [41]. There are also hypervisors that were designed in a lightweight manner, with the intention to be applied in resource-constrained environments at the Edge level. Examples of lightweight hypervisors are Bao [47] and ACRN [48], among others.
Bao is a lightweight bare-metal hypervisor designed for mixed-criticality systems. It strongly focuses on isolation for fault-containment and real-time behavior. Its implementation comprises a thin-layer of privileged software leveraging ISA virtualization support to implement the a static partitioning hypervisor architecture [47]. ACRN targets itself to IoT and Egde systems, placing a lot of emphasis to performance, real-time capabilities and functional safety. ACRN currently only supports x86 architectures, and as it is mainly backed by Intel, support to other architectures may not appear any time soon [48].
Gunyah 19 is a relatively new hypervisor by Qualcomm. It is a microkernel design with capability access controls. Gunyah being a new project has a very limited HW support, and practically non-existent community outside Qualcomm. KVMs 20 is an aarch64 specific hypervisor, building upon popular KVM, bringing a lot of flexibility for example in terms of choice of VMMs. Thanks to a small size, it is possible to formally verify hypervisor EL2 functionality [49]. While there are a lot of benefits, it is limited to on CPU architecture, and maintaining KVMs patch series across several versions of Linux kernel may become an issue.
KVM relies in user space tools such as the Quick Emulator (QEMU) [50] to serve as VMM and instantiating virtual machines. In the KVM paradigm guests are seen by the host as normal POSIX processes, with QEMU residing in the host userspace and utilizing KVM to take advantage of the hardware virtualization extensions [42]. Other VMM can be use on top of KVM, as Firecracker, Cloud Hypervisor and crosvm. Firecracker uses the KVM to create and manage microVMs. Firecracker has a minimalist design. It excludes unnecessary devices and guest functionality to reduce the memory footprint and attack surface area of each microVM [51]. Cloud Hypervisor focuses on exclusively running modern, cloud workloads, on top of a limited set of hardware architectures and platforms. Cloud workloads refers to those that are usually run by customers inside a cloud provider. Cloud Hypervisor is implemented in Rust and is based on the rust-vmm 21 crates. The crosvm VMM is intended to run Linux guests, originally as a security boundary for running native applications on the Chrome OS platform. Compared to QEMU, crosvm does not emulate architectures or real hardware, instead concentrating on para-virtualized devices, such as the VirtIO [52] standard.

seL4 VMM
An Operating System (OS) microkernel is a minimal core of an OS, reducing the code executing at higher privilege to a minimum. The seL4 is a microkernel and hypervisor capable of providing virtualization support [7]. It has a small trusted computing base (TCB), making a minor surface attack compared to traditional hypervisors such as KVM and Xen.
The seL4 supports virtualization by providing specifically two libraries: (i) libsel4vm, and (ii) libsel4vmmplatsupport [53]. The first (i) is a guest hardware virtualization library for x86 (ia32) and ARM (ARMv7/w virtualization extensions & ARMv8) architectures. The second (ii) is a library containing various VMM utilities and drivers that can be used to construct a guest VM on a supported platform. These libraries can be utilized to construct VMM servers through providing useful interfaces to create VM instances, manage guest physical address spaces and provide virtual device support (e.g., VirtIO Net, VirtIO PCI, VirtIO Console). Projects exist that make use of the seL4 virtualization infrastructure, supporting the provision of virtualization environments. Examples of those kinds of projects are CAmkES and Core Platform.
The CAmkES project is a framework for running virtualized Linux guests on seL4 for ARM and x86 platforms. The camkes-vm implements a virtual machine monitor (VMM) server, facilitating the initialization, booting and run-time management of a guest OS [54]. The CAmkES project provides an easy way to run different virtualization examples with one or more VMs and different applications. It also provides a way how to passthrough devices in such environments. One drawback of such a framework is that it is only possible to run static VMs, in which the VM configuration should be defined at design time.
When using CAmkES, a system is modelled as a collection of interconnected software components, as CAmkES follows a component-based software engineering approach to software architecture. These software components are designed with explicit relationships between them and provide interfaces for explicit interaction [55]. The development framework provides: (i) a domain-specific language (DSL) to describe component interfaces, components, and whole component-based systems, (ii) a tool that processes these descriptions to combine programmer-provided component code with generated scaffolding and glue code to build a complete, bootable, system image, and (iii) full integration in the seL4 environment and build system.
CAmkES proved to be too complex, static and maintenance intensive. Because of this reason, many projects and companies have rolled their own user space. As the VMM is in the user space, the challenges and limitations are imminent in the virtualization too. To remedy the situation, the seL4 community is introducing seL4 Core Platform [56] [57], or seL4cp, and seL4 Device Driver Framework 22 , or sDDF. The two new components are attempts to fix the shortcomings of CAmkES. This also means that the VMM parts will be changed significantly too.
The Core Platform provides the following abstractions: protection domain (PD), communication channel (CC), memory region (MR), and notification and protected procedure call (PPC). A VM is a special case of a PD with extra, virtualization-related attributes. The whole virtual machine appears to other PDs as just a single PD, i.e. its internal processes are not directly visible [57]. A PD runs an seL4CP program, which is an ELF (Executable and Linkable Format) file containing code and data, both of which are exposed as memory regions and mapped into the PD. The original version of the seL4CP was fully static, in that all code had to be fixed at system build time, and PDs could not be restarted. The addition of dynamic features is in progress [58]. The seL4 Device Driver Framework (sDDF) provides libraries, interfaces and protocols for writing/porting device drivers to run as performant user level programs on seL4. The sDDF also aims to be extended to a device virtualization framework (sDVF) for sharing devices between virtual machines and native components on seL4.
Even though the seL4 VMM exists and is available to use, it lacks in providing essential features for virtualization support in complex scenarios. Moreover, its fragmentation by different closed-source deployments makes the mainline depreciate fast. Thus, it is necessary to discuss the desired features for such a standard VMM.

Philosophy of a Standard VMM
It should be immediately obvious that even a community as small as the commercial users of seL4 will have difficulty agreeing to an all-encompassing standard. Thus, what is proposed is to establish a driving philosophy for the design of a baseline VMM rather than prescribe a specific system architecture. There is the need to discuss the possible missing features of the existing seL4 VMM [53] concerning a standard VMM, more so than a prescription for the right way to do it. Indeed, this will entail recommending high-level architecture patterns but cannot lock an adopter into specific implementations. Each adopting integrator will inevitably start from the new standard and refine the implementation for their use case. One size does not fit all, so customization will always occur. The effort here is to close the gap between the current VMM baseline and the point of necessary deviation. Refinement should only be necessary to cover specific requirements and edge cases highly unlikely to appear in multiple projects across the integrator community.
For this discussion, driving philosophical concepts can be roughly binned into Design Principles and Feature Support Tenets. The Design Principles and Feature Support Tenets were defined based on features present in already available VMMs (see Section 3) and the technical challenges they posed. The Design Principles were also defined based on the most common Quality Attributes for Embedded Systems [59] [60] [61] [62]. Moreover, the Feature Support Tenets are also based on the open challenges on seL4 virtualization domain, accordingly to open discussions in seL4 community channels 23,24 . A deeper discussion about the Design Principles and Feature Support Tenets will be needed before implementations at seL4 mainline. This list intends to be a starting point for discussing such topics.

Design Principles
Five major design principles are recommended as potential directions towards the standard VMM. They are motivated to be open, modular, portable, scalable, and secure.

Official and Open Source
The existing seL4 VMM [53] employs an open-source license, and any new implementations under the proposed standard should remain in accordance with this approach. This applies to all the code up to the point of necessary differentiation. Individual integrators should always retain the ability to keep closed-sourced their highly specialized or trade secret modifications. This strikes a balance between business needs such as maintaining a competitive edge and fully participating in a collaborative community around a common baseline. Open sourcing the standard VMM is essential for the seL4 community to engage collaboratively and improve the VMM by either contributing to the source code repository or using and learning from it.
It is recommended to place the standard VMM baseline under the purview of the seL4 Foundation to benefit from the structure and governance of that organization. The desire is that it will gain in stature as well, as the current VMM is a second-class citizen in the community. Alongside the source code, the Foundation should periodically publish reports about major updates and possible new directions as new technologies mature. In this way, it will help to maintain a long-term roadmap to incorporate new features such as ARMv9 Realms [63], for instance.

Modular: Maintainable and Upgradable
It is expected that the standardized VMM would be deployed in heterogeneous environments and scenarios under quite varied use cases. This will require flexibility in aspects such as system architecture, hardware requirements, performance, etc. It is essential to follow a modular design approach to guarantee the applicability of the VMM in any of those variants.
In implementing the VMM modularly, it is essential to achieve its readability by following the C4 (Context, Containers, Components, and Code) model, for instance. The C4 model is an "abstraction-first" approach to diagramming software architecture [64]. It decomposes the system so community members can pick and choose components for their project. The Context shows how the software system in scope fits into the environment around it. Containers inside a Context define the high-level technical building blocks. A Component is a zoom-in to an individual Container and shows its responsibilities and implementation details. Finally, Code is a specific description of how a Container is implemented. The modular approach makes it possible for integrators to define the Context, Containers, Components, and Code that must be pieced together for a VMM to support specific features, making their VMM highly customized to their end goal.

Portable: Hardware independence
With the vast number of supported platforms by the seL4 kernel [65], the VMM should also be generic enough to support them. It can be a lofty goal, but the standard VMM should be designed and written with hardware abstraction layers such that it can compile for a minimum of the ARM, x86, and RISC-V Instruction Set Architectures. In this way, the standard VMM is not explicitly linked to a specific set of hardware characteristics. Of course, different ISAs may impose architectural differences. However, there is the need for a minimal and modular VMM that could be easily moved from 4 core ARM SoC (big.LITTLE) to a 48-core Thread Ripper AMD x86, as an example. The standard VMM could be seen as a baseline for different hardware implementations. Obviously, the baseline will not take advantage of all the platforms' hardware features. However, it can be used for Proof-of-Concept implementation and learning purposes for being easy to deploy on different platforms.
Additionally, it is essential to consider and accommodate the rather large differences architecture-wise, even with the same ISA implementation. For example, Codasip 25 and SiFive 26 implementations of RISC-V have non-ignorable differences, while ARM implementations from Qualcomm, Samsung, and NXP exhibit wildly different behavior [66]. Though SoC vendors may be compliant with the ISA specification, there usually is some collection of deviations or enhancements present, often implemented as a black-box binary. Areas of concern include control of the system's Memory Management Units, Generic Interrupt Controller, TPM/TEE, secure boot process, and access to the appropriate privilege level for the seL4 kernel (e.g., EL2 for Qualcomm-ARM).

Scalable: Application-agnostic
A standard VMM should be scalable in the sense that it needs to be able to support several applications running on top for different specific purposes. Different applications may have a distinct set of requirements such as performance, safety, security, or real-time. The VMM should be able to meet those requirements and provide a way for the applications to reach them. Moreover, the VMM should guarantee that the applications will run as expected, being able to initiate and finish the tasks successfully. A VMM scheduler should be responsible for balancing the loads and ensuring that no application (i.e., thread) is left unattended.
The scalability of the systems is also tied to their performance. In light of this, it is essential that the VMM supports from one to an arbitrary number of processing units or cores. The existing seL4 VMM does not support multiprocessing and consequently highly restricts the number of applications that can be run atop. Enabling multiprocessing would help achieve better performance, thus improving the scalability of the system performance as a whole. We discuss in detail the possibilities to enable multicore VMM further in this paper in the Multicore & Time Isolation section.

Secured by Design
A standard seL4 VMM implementation should support one VMM instance per VM. Even though this approach is well followed by most of the integrators and supported by seL4, it is important to highlight its benefits. This approach improves both scalability and security of the solution. If a guest OS is compromised, it opens an attack vector toward the VMM. However, the risk is limited if there is a dedicated VMM per VM. The other VMs, their VMMs, and guest OSes are completely isolated by the stage 2 translation. This assumes a formally verified kernel and that the translation tables or the memory areas the tables point to are distinct for each VM. Though this approach is already common today, some integrators do not always implement it for time-to-market pressure, reusable code, or other unusual circumstances. Support for this design should be standardized so that the enabling code can be considered boilerplate and easily consumed. Figure 2 shows a representation of a secured by design architecture, with one VMM per VM. Even though the VMM has more direct interaction with the hypervisor, it is placed in the User Mode. The VMs are present at both User and Kernel modes, as they can have applications and drivers, respectively.

Feature Set Support Tenets
Four major features are recommended as potential directions towards the standard VMM to support hardware mechanisms and provide security and performance benefits.

System Configuration
Currently, there are two main approaches to facilitate the system configuration when running virtual environments on top of seL4. The first to be introduced was CAmkES, that stands for Component Architecture for microkernel-based Embedded Systems [67]. The second one is seL4 Core Platform (seL4CP) [56]. The Core Platform, which was recently introduced, intends to be the standard for such virtual environments on top of seL4. Thus, the CAmkES is being deprecated.
CAmkES is a software development and runtime framework for quickly and reliably building microkernel-based multiserver (operating) systems [67]. Currently, using the CAmkES framework with the VMM will result in a fully static configuration. The VMs must be defined and configured during the build. This also includes defining the RAM area. It is designed to achieve security guarantees so as not to allow post-build modifications to the number of running VMs and their interconnections. This is a highly desirable aspect when the use case calls for it. However, it can be inflexible and even short-sighted when the nature of the user experience requires dynamic configuration, i.e. no dynamic start/stop/restart capability.
It is often necessary to have a more dynamic seL4-based environment for the purpose of allowing better usability, modularity, or even scalability. The Core Platform is an operating system (OS) personality for the seL4 microkernel. The Core Platform makes seL4-based systems easy to develop and deploy within the target areas. It can be used to bring up VMs on top of seL4. Core Platform promises to deliver dynamic features to the seL4 environment [56]. However, it is still in progress with ongoing virtualization features in development 27 . The Trustworthy Systems -UNSW 28 group also intends to formally verify two core aspects of the seL4 Core Platform 29 : (i) correctness of the implementation, i.e. its abstractions function as specified, and (ii) correctness of the system initialisation, i.e. the collection of underlying seL4 objects are fairly represented by the system specification.
A new VMM standard should enhance the existing static build approach with a build-time specification stating that dynamic configurations are also permitted. They could be limited by providing build-time parameters for acceptable configurations. To achieve a dynamic environment, it should be possible to use the seL4 mechanisms for transferring/revoking capabilities to the entities during runtime, providing a potential implementation mechanism for this feature. It may also be an option to build a core common component to serve as an "admin VM" for dynamic configurations, even subjecting it to some degree of formal methods verification. This is anticipated to be an area of much research and prototyping to achieve the desired balance of security and flexibility.

Multicore & Time Isolation
One of the key aspects of virtualization is the need for efficiency, where multiprocessing configurations play an important role. Although multicore support is a complex engineering task, it should be supported in its simplest shape to avoid contention and potential deadlocks. Different physical CPUs (pCPUs) can be enabled by the kernel (in a Symmetric Multiprocessing -SMP configuration) in order to allocate them to a different system running threads according to the use-case application requirements. Next, we present potential multi-core configurations that a standard VMM should be able to support using a clear multiprocessing protocol: • Direct Mapping Configuration: multiple single-core VMs running concurrently and physically distributed over dedicated CPUs. Figure 3 shows the representation of the Direct Mapping Configuration approach.   Figure 4 shows the representation of the Hybrid Multiprocessing Configuration approach.  The two depicted configurations are examples for future reference of a standard VMM, but it is not strictly limited. Most, if not all, current and near-future use cases are covered by a model where there are multicore VMMs that are pinned to exclusive cores and unicore VMMs that can be multiplexed on a core. Ideally, it would be up to the system designer to decide which configuration to use. It could be either static or dynamic, enabling switching from a given configuration to another in run-time. The selected configuration will affect several threads in execution. In the seL4 context, threads can be running either Native apps, OSes, and/or VMMs. The former is typically used to run device drivers or support libraries. OSes are using threads running over virtual abstractions, or VMs, while VMMs are creating and multiplexing these abstractions to be able to encapsulate OSes. They all require an abstraction representing the pCPU time but differ from the supported execution level and their scope over other system components. For example, a VMM can access the VM internals but not the opposite. Other features that are likely required by multicore design are: • vCPUs Scheduling: The ability to schedule threads on each pCPU based on their priority, credit-based time slicing, or budgeting depending on the algorithm selected. As an example, It could be a design configuration whether it supports vCPU migration (a vCPU switching from pCPU id:0 to id:1) with also the possibility to tie up a set of the vCPUs to pCPUs. Another potential configuration is the static partitioning one, where all the vCPUs are assigned to pCPUs at design-time and are immutable at run-time. In addition, having dynamic and static VMs configuration in a hybrid mode could be something to support. A multiprocessing protocol with acquire/release ownership of vCPUs should be supported. The seL4 kernel has a scheduler that chooses the next thread to run on a specific processing core, and is a priority-based round-robin scheduler. The scheduler picks threads that are runnable: that is, resumed, and not blocked on any IPC operation. The scheduler picks the highest-priority, runnable thread (0 255). When multiple TCBs are runnable and have the same priority, they are scheduled in a first-in, first-out round-robin fashion. The seL4 kernel scheduler could be extended for the VMMs. • pIRQ/vIRQs ownership: physical interrupts (pIRQs) shall be virtualized (vIRQs) and require a multiprocessing protocol with simple acquire/release ownership of interrupts per pCPU/vCPU targets. Besides, support hardware-assisted interrupt-controllers with multicore support is required. •

Inter-vCPU/inter-pCPU communication:
Another key aspect of multiprocessing architectures is the ability to communicate between pCPUs. Also, with equal importance, communication between vCPUs results in not only inter-pCPU but also inter-vCPU communication. Communication is very important in multiprocessing protocols, but it should be designed in a way that is simple to verify and validate.

Memory Isolation
As stated before, the secure by design principle is strongly based on memory isolation. Memory isolation is critical to enforce the security properties such as VMs confidentiality and integrity. Hardware-enforced and partial microkernel access-controlled memory translation and protection between VMs/VMMs and Native Apps are key security requirements for security-critical use-cases. Support for hardware-assisted virtualization (extended Page Tables or second-stage) MMU should be an integral part of the standard VMM. Next, are some features for future reference that can leverage such hardware for memory isolation: (i) configurable VM Virtual Address Space (VAS); (ii) device memory isolation; and (iii) cache isolation. and (iii) the virtualization layer could intercept all accesses to the device and decode only those that intend to configure its DMA engine in order to do the corresponding translation if needed, and control access to specific physical memory regions. In order to meet these three requirements a standard VMM requires support for either an IOMMU (with one or two stage translation regimes) or software mechanisms for mediation. • Cache isolation through page-coloring: Micro-architectural hardware features like pipelines, branch predictors, and caches are typically available and essential for well performant CPUs. These hardware enhancements are mostly seen as software-transparent but currently leaving traces behind and opening up backdoors that can be exploited by attackers to break memory isolation and consequently compromising the memory confidentiality of a given VM. One mitigation for this problem is to apply page coloring in software and could be an optional feature supported by a standard VMM. Page coloring is meant to map frame pages to different VMs without colliding into the same allocated cache line. A given cache allocated by a VM cannot evict a previously allocated cache line by another VM. This technique, by partitioning the cache in different colors, can protect to some extent (shared caches) against timing cache-based side channel attacks, however, it strongly depends on some architectural/platform parameter limitations such as cache size, number of ways and page size granularity used to configure the virtual address space. L1 cache is typically small and private to the pCPU while L2 cache is typically bigger and seen as the last level of cache that is shared among several pCPUs. It would be possible to assign a color to a set of VMs based on their criticality level. For example, assuming the hardware limits the system to encode up to 4 colors, where one color can be shared by a set of non-critical VMs, other for real-time VM for deterministic behavior, and the other two for a security-and performance-critical VM that requires increased cache utilization and at the same isolation against side-channel attacks.

Hypervisor-agnostic I/O Virtualization and its derivations
Many security use-cases require virtualization environments with reduced privilege such that only specific VMs, called driver VMs, can directly access hardware resources while the others, called User VMs, run in a driverless mode since device drivers are seen today as a major source of bugs. A compromise caused by exploitation of a driver bug can be contained in its own VM. Typically, in such environments, any VM that will potentially run unknown code and/or untrusted applications may require isolation from key device drivers sequestered into their dedicated VMs. Inter-VM communication, including access to the devices, must be done by proxy over well-known and managed interfaces. This approach requires a combination of VM kernel modifications and VMM modules to be able to communicate and share basic hardware devices over virtual interfaces.
The OASIS collaboration community manages the set of VirtIO standards [52] that are implemented to various degrees by Linux and Android. Given the excellent support, it is recommended to adopt VirtIO implementations for multiple interfaces in the standard VMM. Support for standardized VirtIO server implementations in the VMM would be a meaningful complement to guest OS clients. For instance, the VirtIO-Net server in the VMM could store a table of MAC addresses, creating a virtual switch. In the case of the VirtIO-Block server, the VMM could terminate VirtIO-Block requests so that address mappings are not known by the user-facing guest OS, then start up another request to the VM containing the device driver to perform the actual write. For instance, in complex architectures with more than one guest OS accessible from the user perspective, VMM VirtIO servers could also handle multiplexing access to various devices between VMs, creating a "multi-persona" capability.
Among the possibilities of implementing VirtIO interfaces, the following items present examples of how it can be used and integrated with a standard VMM: • VirtIO can be used for interfacing VMs with host device drivers. It can support VirtIO driver backends and frontends on top of seL4. VirtIO interfaces can be connected to opensource technologies such as QEMU, crosvm, and Firecracker, among others. In this scenario, the open-source technologies will execute in the user space of a VM different from the one using the device itself. This approach helps in achieving reusability, portability, and scalability. Figure 5 shows the representation of such an approach considering a VirtIO Net scenario in which a Guest VM consumes the services provided by a back-end Host VM. • VirtIO interfaces can be connected to formal verified native device drivers. The use of such kinds of device drivers increases the security of the whole system. Moreover, the verified device drivers can be multiplexed to different accesses, switching device access between multiple clients. The multiplexer is transparent to native clients, as it uses the same protocol as the (native) clients use to access an exclusively owned device. Figure 6 shows the representation of a device virtualization through a multiplexer. In this example each device has a single driver, encapsulated either in a native component or a virtual machine, and is multiplexed securely between clients 30 .
VirtIO also includes standards for Touch, Audio, GPU, and a generic VirtIO-Socket interface which can be used to pass data of any form. Standardized implementations for these are not mature or widely available outside of the automotive use case. OpenSynergy actively worked with Google and Qualcomm to include these interfaces in Android Auto [68]. It may be possible for the seL4 community to expand those implementations to other areas through customer-funded projects.

VMM API
Apart from the previously mentioned topics, a seL4 standard VMM could also be a programmable API rather than something configured with static Domain Specific Language (DSL) during compilation (e.g., CAmkES). The API makes it possible to wrap the functionality  Figure 6. VirtIO interfaces considering a formally verified Device Driver to any compile-time DSLs, custom native services and enables run-time dynamism. The API could have a compile-time configuration for enabling/disabling dynamic features. It should build upon layers so one can use the low-level APIs with all seL4-specific complexity involved, but the API should keep the seL4-specific things minimal at a high level.
An API would make it possible for some elements of the VMM not to be wrapped in a runtime context, like it is now, because it then already makes an assumption about the architecture. That assumption might not be what most integrators (i.e., companies) are after. Let's take KVM as an example. If KVM would provide more than basic constructs and include runtime context (essentially VMM), then we would not be able to have different VMMs (QEMU, crosvm, cloud-hypervisor). It does not mean that there is not an API already in the seL4 environment. But, it is pretty fragmented and not uniform as one might expect.
The integrators could have an option to use the seL4 VMM (i.e., with characteristics similar to the ones presented in this article) and also the VMM API to have a more diverse virtualization environment. There is a certain minimal subset that a VMM must handle, like handling the hardware virtualization of Generic Interrupt Controller Architecture (GIC) and handling faults. However, it should also be possible to define where VirtIO-console should be handled or that VirtIO-blk device must be handled by QEMU in some VM. If someone has a native VirtIO-backend for some of those examples, it should be possible to use it.
With the seL4 VMM API, it is possible to follow the one VM per VMM "rule" as it is a safer approach from a trust point of view. We could have different flavors of VMMs, such as QEMU, crosvm, and cloud-hypervisor, as each one of them will have its strengths and weakness [69] [70].

Formal Methods
No discussion of an seL4 adjacent system is complete without consideration for the impact of formal methods. Since this discussion is driven by the need for a VMM which can handle complex, real-world use cases, an integrator would likely be using a hardware platform for which seL4 does not yet support formal methods, such as aarch64 or a multicore configuration. In this case, the effect of formal verification is a moot point. However, in the future, or for a simpler configuration, we can still assess the impact.
Currently, the VMM is assigned per each VM, and thus it is in the VM's Trusted Computing Base. If we consider the scenario in which it is possible to use a VMM API to run VMMs from different flavors, the formal verification would rely just on the minimal part responsible to execute those VMMs and not in the VMM itself. The VMM is considered part of a guest for the purposes of formal methods, so maintaining the proofs would be challenging. However, there may be a specific case to be made for the standard VMM to be shared across all VMs in a particular system. In that instance, the VMM could be subject to formal methods verification. However, it would be a complex and costly undertaking and goes against the "One VMM Per VM" principle detailed previously in this document.
Parts of the standard VMM could be subject to verification, an example could be the device drivers. The Device Virtualisation on seL4 project 31 has the long-term goal of formal verify device drivers, which is enabled by the strong isolation provided for usermode drivers on seL4, which allows verifying drivers in isolation. The seL4 Core Platform has a working in progress project 32 to formally verify two core aspects of it: (i) correctness of the implementation (i.e. its abstractions function as specified), and (ii) correctness of the system initialisation (i.e. the collection of underlying seL4 objects are fairly represented by the system specification).

Conclusion
Based on the current seL4 VMM implementation, we could conclude that the existing VMM baseline is not ideal and lacks support for many useful design features already present in legacy VMMs of other hypervisors. There are implementations currently in development and supported by the seL4 community, such as the seL4 Core Platform and sDDF. Those approaches may help shrink the gap of current user space seL4 VMM. However, the current gap can be remedied by helping to build a new VMM standard, unified under the principles laid out in this article. Ultimately, after being extended, the present potential directions toward a standard must be put to the test by making a concerted effort to build a real-world proof of concept around it. This will almost certainly require significant funding -either of an R&D nature or from an end customer.
Considering the seL4 ecosystem, one step towards defining a standardized VMM would be the creation of an RFC for community discussion and approval. Even though an organizational step, the RFC can start a deeper technical discussion among the seL4 community. It will be up to one or more members of the seL4 community to look for opportunities to take up this mantle and be a champion for this initiative. Also, such standard VMM will only be successful when discussed within the seL4 community. Thus, spreading such ideas through the seL4 community communication channels is essential. In light of this, we started this discussion by presenting a talk and discussion at seL4 Summit 2022 33 . From the participation at the seL4 Summit 2022, we could extract potential topics of interest from the seL4 community regarding a standard VMM. Moreover, the creation of work groups within the seL4 Community around topics of interest may be the best approach to leverage such standard VMM.

Conflicts of Interest:
The authors declare no conflict of interest.