3.1. Implementation
In the following, we discuss the implementation details of our Q8S prototype with examples taken from our OpenStack environment.
Depending on the configuration of a given OpenStack cloud, the default network masks may vary. In our OpenStack environment, all hosts are part of the 10.254.1.0/24 network range, and the QEMU VMs use 192.11.3.0/24. When assigning the IPs of the QEMU VM, we configured Q8S to automatically match the host IP, e.g., the host with 10.254.1.3 would internally run 192.11.3.3. This also ensures that the QEMU VM IPs are unique across both networks.
Figure 3 provides an overview of a cluster created by Q8S with the 10.254.1.0/24 IP range used for the hosts. The name of the default network interface of the hosts in the OpenStack network is ens3 here, but it may be different in other OpenStack clouds.
The starting instance in the top left of
Figure 3 is the node on which the user has started Q8S and that initialized the cluster as its first control plane node. Below that are the other control plane nodes, which do not install an additional virtualization layer. On the right are workers depicted with numbers from 1 to n. It should be noted that in
Figure 3, there are two worker types: x86 and arm64.
The worker nodes deploy QEMU and connect its internal network via a bridge to the host network. From the perspective of the other nodes, the additional virtualization layer provided by QEMU is not visible on the network layer, as any requests sent to a worker node are forwarded by the iptables rules to the internal VM via NAT.
The internal networking of a worker host are depicted in detail in
Figure 4, with the white box on the left listing the iptables rules. These include prerouting DNAT rules to forward any traffic send to the host to the QEMU VM on the same port. The exception is port 22 of the host, which still provides regular SSH access to the host node, and port 2222, which is redirected to port 22 of the QEMU VM for SSH access. For outgoing traffic, SNAT is used to masquerade requests as the host node.
The right half of
Figure 4 shows the network bridge provided by libvirt labeled as virbr0, which connects the host and QEMU VM networks. Inside the QEMU VM is the enp1s0 interface, which holds the IP of the overall QEMU VM. Inside the VM is the local network created by Flannel, which is used to assign IPs to the individual containers. Flannel is shown in
Figure 4 to use the node’s public-ip for inter-node communication, which is necessary for it to accept the requests forwarded via the iptables rules.
Q8S requires two configuration files at the start: the YAML cluster configuration file and a file with credentials for the OpenStack cloud. The official Python OpenStack SDK used by Q8S is able to process clouds.yaml files, which can be downloaded through the OpenStack Horizon web interface under application credentials. With these application credentials, Q8S is able to query the OpenStack API for the OpenStack project for which the credentials were created.
The cluster configuration file includes the user-provided settings for the cluster to be created. An example of such a file is given in Listing 1. The configuration includes the following fields, which the users are expected to adjust to their needs:
git_url:A URL pointing to a Q8S git repository, which is either public or accessible via embedded access tokens. This is required for the later installation stages to download the respective setup scripts on the new nodes.
private_network_id: The id of the internal network in which all hosts reside; it can be found in the OpenStack Horizon web interface under networks as the id of the private subnet. This is needed to request new VMs via the OpenStack API.
remote_ip_prefix: The IP mask of the OpenStack network. In our environment, this is 10.254.1.0/24.
default_image_name: The name of the OS image that should be used for the OpenStack images. Q8S expects this to be an Ubuntu image.
name_of_initial_instance: The name of the starting instance in OpenStack. This is needed to update its security groups.
security_groups: The list of OpenStack security groups that should be added to each node. This list must at least contain q8s-cluster, which is the group configured for internal communication of the Kubernetes cluster.
required_tcp_ports: The list of TCP ports that should be opened in the q8s-cluster security group for inter-node communication. The list given in the example in Listing 1 should be kept, but further ports may be added.
required_udp_ports: The same as for TCP ports abovem but for UDP.
worker_port_range_min: This is the lower end of the port range that is to be opened in addition to the TCP ports specified above and used for Kubernetes container node ports.
worker_port_range_max: The high end of the port range as specified above.
master_node_flavor: The flavor to be used by additional control plane nodes. The flavor in OpenStack specifies the number of CPUs and amount of system memory OpenStack should allocate from a project quota to a specific VM.
number_additional_master_nodes: The number of control plane nodes that Q8S should deploy in addition to the starting instance. The control plane IP is always set to the IP of the node running Q8S and does not deploy failover mechanisms. Therefore, even if the created cluster includes multiple control plane nodes, it is not a high-availability (HA) deployment.
worker: The vm_types specified in the next section can be used here to indicate how many instances of a given type should be deployed by Q8S.
The vm_types a user wishes to use should be each specified as a dictionary with the following fields:
architecture: System architecture of the emulated CPU. Our Q8S prototype supports x86_64 and ARM_64.
num_cpus: The number of emulated CPUs that should be available in the QEMU VM.
cpu_model: Specific CPU model that should be emulated by QEMU, which also determines the available CPU speed. The list of supported CPU models depends on QEMU and can be found in its documentation.
machine_model: Machine model requested through QEMU. This should be kept as virt.
ram: The amount of system memory to allocate for the QEMU VM in MB.
storage: Amount of storage to allocate for the QEMU VM in GB.
openstack_flavor: Flavor to use in OpenStack for the host. The flavor should include at least as many CPUs as the emulated node.
Listing 1. An example file for a cluster definition. The notation using '!', e.g., !ClusterData and !VmType, is used by Q8S to map the respective sections of the configuration to Python data classes. |
![Algorithms 18 00324 i001]() |
Figure 2 gives a high-level overview of the Q8S workflow and the interactions between the components. To provide a more detailed insight to the involved steps for Q8S, the following presents the concrete operations that are executed in order:
Creation of the security group q8s-cluster if it does not exist and the configuration of the rules as specified in the settings file;
Creation of an SSH key pair, which is uploaded to OpenStack such that the new VMs are initialized with it and can later on be accessed;
Creation of the OpenStack VMs via its API for the control plane and worker nodes;
Waiting for all new OpenStack VMs to be reachable via SSH;
Installation of Kubernetes dependencies and system configurations required for Kubernetes on the instance running Q8S;
Initializing the Kubernetes cluster and installing Flannel;
Extracting the Kubernetes token for joining of worker nodes and uploading cluster certificates, required for joining control plane nodes;
Installation of Kubernetes dependencies and system configurations on the additional control plane nodes;
Joining of the additional control plane nodes to the cluster;
Installation of QEMU and libvirt on the worker hosts;
Configuring of the QEMU network to ensure the desired IP address that matches the host IP will be assigned to the QEMU VM;
Downloading of the Ubuntu Cloud-Images;
Preparation of the user-data and meta-data files for the Cloud-Image VM;
Preparation of the Cloud-Image for QEMU;
Creation of the QEMU VM from the Cloud-Image;
Installation of Kubernetes and dependencies and system configurations on the QEMU VMs as triggered by cloud-init;
Waiting for all worker nodes to join the Kubernetes cluster.
Once all these steps are completed, the cluster is ready to be used, and Q8S is no longer required. In the prototype implementation of Q8S, the above steps are executed sequentially and are still open to optimization through parallelization, for example, by letting the deployment process of the additional control plane nodes and the emulated worker nodes run in parallel.
3.2. Evaluation
In this section, we discuss the measurements that were acquired according to the methods described in
Section 2.2.
Table 2 shows the bandwidth measured through the benchmark for pod-to-pod and pod-to-service communication for emulated and non-emulated nodes via TCP and UDP. Except for UDP being slightly slower on the emulated nodes, the performance is almost identical despite the additional layer of networking that applies for the emulated nodes.
The difference in UDP speed could be related to the UDP packet size [
34] being non-optimal for wrapping in the Flannel environment. As the difference is marginal, we consider the networking performance to be effectively identical.
We measured the following network latencies:
Worker-Node-to-Worker-Node round-trip latency: ≈3.1 ms;
Pod-to-Pod (different nodes) round-trip latency: ≈4.8 ms;
Pod-to-Pod (same node) round-trip latency: ≈1.1 ms;
Worker-Node-to-Control-Plane-Node round-trip latency: ≈1.5 ms;
Control-Plane-Node-to-Control-Plane-Node round-trip latency: ≈0.5 ms.
Between control plane nodes, no additional iptables rules are applied; between a worker node and a control plane node, one set of iptables rules are applied; and between worker nodes, the iptables are applied both on being sent and on being received.
Going from no iptables rules to one pass adds about 1 ms of latency, and adding another pass adds another 1.6 ms. Compared to pod-to-pod on two worker nodes, this becomes even slower, as it now also has to pass through the Kubernetes internal networking.
Considering the low latency for direct pings between two hosts, these latencies created by the additional iptables and network virtualization layers seem high but are overall still relatively low and acceptable for the majority of use cases.
The CPU usage of the nodes during the throughput benchmark as well as in the idle state is given in
Table 3. The benchmark consists of a client-and-server pairing, where the client sends requests to the server.
The CPU usage is significantly higher for the emulated nodes than for the OpenStack nodes. Even during the benchmark, the CPU usage of the OpenStack node barely goes up except for the UDP communication, where it is significantly increased for the client node but not the server node.
The reason for the increased CPU usage likely lies in the emulation, as, for each CPU instruction, the hypervisor has to perform additional operations to check and translate the instruction in addition to actually performing the instruction. This overhead factor even varies between client and server, such that we can assume that sending and receiving requests require different quantities of overhead operations by the hypervisor.
Regarding the increased client CPU usage for UDP, when comparing the measurements to other benchmarks created with k8s-bench-suite [
35], these also show an increase in CPU usage for UDP clients. Our test setup features only 2 CPU cores; therefore, we can speculate that the performance differences would be less pronounced when using more cores, as in the external example.
The memory usage of the nodes is given in
Table 4 and shows no significant changes during the benchmark. Notably, the memory usage of the emulated server is even slightly lower than that of the OpenStack server. Overall, the memory usage does not vary significantly, such that it appears to be handled well by the emulation layer.
Moreover, to verify that Q8S is capable of deploying larger clusters, we have deployed a cluster with 50 emulated worker nodes including 30 emulated x86_64 nodes and 20 emulated ARM64 nodes as well as two additional master nodes. The time to complete the deployment was 77 min, with the times taken for the individual stages shown in
Table 5.
The first stage in
Table 5 represent the time taken for the OS to provision the nodes and is stopped when the OS reports all nodes as ready. However, for the OS, a node appears ready when it has successfully booted, but it might still be starting up and launching services such that access to it via SSH is not yet possible. In the next stage, Q8S distributes setup scripts and configuration files to the nodes, which takes only a few seconds per node but is performed sequentially in our implementation and nodes might not be ready to establish an SSH connection yet, such that the script has to wait.
The third major stage in
Table 5 is the setup of the hosts, which involves installing QEMU as well as downloading and preparing the guest image for the emulated workers. This stage is processed in parallel. Afterwards, the cloud-init script has to run in all emulated workers to install Kubernetes and finally join the cluster. After joining the cluster, the nodes must all report ready status for the next status before the network setup is completed in the last stage.
The slowest operation by far was the setup of the emulated nodes, including running cloud-init and installing Kubernetes, which took about 23 min for the fastest node and 39 min for the slowest node. Moreover, all x86_64 nodes completed their deployment after 25 min, while this stage took the ARM64 nodes at least 35 min.Another longer stage was the initial setup of each of the hosts, i.e., the installation of QEMU and other dependencies, which took 24 min.
In this part we discuss how the presented prototype for Q8S aligns with the functional requirements, which we defined in
Section 2.
As we were able to perform the benchmarking on a Kubernetes cluster deployed through Q8S, we can consider FR5 to be completed. The benchmarking was performed on an OpenStack cluster for which we did not request any additional privileges, fulfilling FR1 and FR6. Specifying the node settings including system architecture, CPU speed, core count, memory and storage size also worked, which completes FR2. Q8S was also able to deploy the desired amount of node types and set up a working emulation for them, which matches FR3 and FR4.
Table 6 provides an overview of the status of the requirements.
However, while Q8S completes the goals we set out for it, we still see further features and improvements that should be made. These include support for limiting bandwidth and latency along with simulating network failure rates to emulate edge nodes. Moreover, Q8S works slow and a full deployment can take over an hour, highlighting the need to parallelize and optimize the process. Nevertheless, once the deployment is complete, the created cluster can be used for multiple rounds of experiments without having to rerun Q8S, making Q8S a valuable tool for emulating instead of simulating heterogeneous Kubernetes clusters.