1. Introduction
The phenomenon of service rate degradation emerges in cloud nodes and other technical services with the problem of resource contention. When a single system serves many customers, processing of their requests can slow down due to various factors (e.g., shared CPU cores and caches, memory and disk usage, etc.). In these cases, when resource utilization in a cloud node grows, we find that performance of a single task execution (request, operation, or virtual machine—VM) decreases. This phenomenon can be modeled by using degradation of service rates for the tasks executed in the node. It is obvious that such degradation depends on the number of the tasks. We may figure performance degradation of a single task while the number of tasks in the node increases, as shown in
Figure 1. Here, we have intervals of small decreasing or intervals with constant performance and some points where the performance falls greatly. These points may correspond to the saturation points of some resource (see, for example, [
1,
2,
3]).
Methods of experimental data analysis of service rate degradation in cloud nodes are described in [
4], where the authors point out the negative effects of resource sharing and propose methods for simulation of the node operation. The aim of their research is to predict the performance overhead for executing services on virtualized platforms. Experiments show that the number of VMs running in the node affects the performance and its service degrades while the number of VMs grows.
Ref. [
5] considers server consolidation as a factor of service rate degradation in a cloud node. The level of the degradation is defined in terms of a server consolidation overhead i.e., extra workload that the system incurs to support consolidation, regardless of the hypervisor type. Experiments depict growth of overhead while the number of VMs increases.
Since the experimental data show service rate degradation, the question arises how to estimate the extent of this degradation. In paper [
6], the authors present a literature survey on performance evaluation for IAAS cloud nodes, which also encounter service rate degradation. As in the previous reference, they observe evaluation algorithms based on overhead of VMs in cloud services.
Ref. [
7] focuses on measuring CPU, memory, I/O and the overall VM performance degradation caused by the performance interference of VMs sharing one physical machine. For measuring, Bayesian networks with hidden variables are used. In paper [
8], the approach to performance measuring is also proposed. The authors consider the relationship among the maximal number of customers, the minimal service resources and the highest level of services. The purpose of the study is to formulate advice on the system design to meet QoS requirements. The proposed metrics can be used as a criteria for the evaluation of service rate degradation. Paper [
9] is devoted to the real-time performance analysis of internet services suffering from service rate degradation. The authors propose an algorithm for evaluating the metrics on-the-fly to catch anomalies in providing service.
We have found just a few papers dedicated to the performance degradation in cloud services. The problem is well known and many authors note it. However, modeling of the systems with service rate decreasing is rare. This is because a study of such models with variable parameters is very complicated. The most popular type of models with state-dependent service rate is stepwise decreasing. For instance, we have some threshold value of the number of VMs in the physical machine. If the number of VMs is less than it, the service rate has some defined value and it equals another value if the number of VMs is greater than the threshold. Such an approach to the modeling is used, for example in papers [
10,
11]. In [
12,
13], degradation refers to aging of severs.
Another method is using modeling of service rates, arrival rates and waiting times (before service) as dependent random variables [
14,
15]. In [
16], the queueing system with workload-dependent service and arrival rates is considered.
Unfortunately, the literature on service rate degradation modeling is mostly limited to the cases where the dependence on threshold values is considered. In our study, we propose to model a cloud node operation as a queueing system with unlimited number of servers and degradation of the service rates dependent on the number of VMs operating in the node. In addition, we take into account the following fact: in each time moment, some of VMs work actively consuming a lot of the node resources and strongly contributing to the performance degradation. On the other hand, some VMs may be in a passive (waiting) regime consuming small amount of resources. Moreover, the VMs inside the node can switch between the regimes. We model this by using the term “phase of service” (or simply “phase”) and consider a model with two phases of service (active and passive). The number of the service phases may be greater than two. However, the study of such a model requires more complex analysis. We aim to perform it in the future using the current one as the base for research.
In the paper, we take into account only the number of VMs operating inside the node for implementation of service rate degradation. Issues on taking into account detailed information on consumed resources as well as how to collect statistical data for estimation of the model parameters (including degradation functions) is outside the scope of this paper.
The salient features of the paper:
An infinite-server queue with two phases of service and service rate degradation is proposed as a new mathematical model of a cloud node;
An effect of service rate degradation is proposed for taking into account performance decreasing of processes in the cloud node that appears due to their contention for the node resources;
The method of asymptotic analysis is adapted for the model and the asymptotic solution of the global balance equation is derived;
Performed numerical experiments show good accuracy of the obtained approximation (asymptotic solution).
The rest of the paper is organized as follows. A detailed description of the mathematical model under study is presented in
Section 2. In
Section 3, we write a local balance equation and present its exact solution. There we establish a condition of the global and the local balance equations equivalence, which give us the applicability area of the obtained analytical (exact) solution. In
Section 4, we apply the asymptotic analysis method [
17,
18,
19] to solve the global balance equation. As the result, we derive two-dimensional Gaussian approximation of the probability distribution of the random process under study. A numerical analysis is made in
Section 5. Firstly, it includes a comparison of the asymptotic and the exact distributions to establish an accuracy of the approximation and its applicability area. In addition, we present an analysis of the dependence of performance parameters on the system configuration ones.
2. Mathematical Model
Consider a cloud node with virtual machines (VMs) operating inside it. We use an infinite-server queueing system as its model supposing that there is no queue for the VMs and all of them can work simultaneously. Actually, the term “server” in the model reflects one VM executed in the node. So, we may have potentially an unlimited number of “servers”, but performance of each of them decreases when their number grows because the node has limited capabilities (the number of CPU cores) for their parallel execution. This effect is presented in the model in the form of the service rate degradation.
Some of VMs inside the node can work in an active regime consuming a great amount of the node resources; other machines may be in a waiting (passive) regime requiring a minimal amount of resources for their work. We model this situation as two phases of the VM’s servicing. The duration of the phases reflecting the necessary time for the VM ends the current regime and switches to another one. We suppose that during a specific regime, the VM should complete all tasks assigned to it. It is obvious that when the number of VMs grows, a performance of each of them decreases due to contention for the node resources. So, we present a service rate of the VM working in phase n (where ) in the form Here, represents nominal service rates for the n-th phase (for the case in which only the current VM is working in the entire node). Functions reflect an effect of service rate degradation: we suppose that they decrease while the number of VMs inside the node grows. Here, is the number of VMs working in the first phase, is the number of VMs working in the second phase. We call these functions the degradation functions. So, using this approach for the modeling, we can take into account performance degradation as well as its differentiated dependence on the number of the VMs working in different phases.
Sometimes new VMs can arrive in the node and some VMs working inside the node can leave it. We model the arrivals as a Poisson process with rate
. At an arrival moment, the VM begins its work in the
n-th phase with given probability
, where
. Upon completion of the current phase
n, the VM switches to phase
k (
) with probability
or leaves the node with probability
. Obviously, the following equalities are true:
The structure of the model under the study is shown in
Figure 2.
Let us denote the following:
is a vector of the probabilities of choosing the n-th phase at the arrival moment;
is a vector of the probabilities of leaving the system after the n-th phase;
is a matrix of the probabilities of transitions between phases. Here, we assume that
; therefore,
Denote the number of VMs in the n-th phase at instant t by . The goal of the study is the obtaining of the stationary probability distribution of the number of VMs in the phases, i.e., we would like to find the stationary probability distribution of the two-dimensional stochastic process .
3. Balance Equations and Their Solution
Denote
as the probability distribution of the number of VMs working in the first and the second phases at moment
t. We derive the system of global balance equations for this distribution as follows (
):
In the steady state, Equation (
2) has the form
Since the direct solution of System (
3) seems problematic, we propose to consider local balance equations.
In
Figure 3, we show the graph of transitions between states of the two-dimensional process
. As we can see, each cycle of the graph consists of two subcycles with three nodes. So, a solution of the local balance equations coincides with the solution of System (
3) only when the following equalities are true:
Thus, we obtain two conditions
From Equation (
4), we can conclude that
Let us look at a couple of examples of the degradation functions satisfying Equation (
5). This condition is true for functions
and
dependent only on the sum of the arguments, i.e.,:
Another example of functions satisfying Equation (
5) is the following:
This means that we have service rate degradation in the first phase only and it does not depend on the number of VMs in the second phase.
Under conditions (
4), the solution of System (
3) is equal to the solution of the following local balance equations:
Note that we have two equivalent systems of the local balance equations for two subcycles in the transition graph (
Figure 3). From Equation (
6), it is easy to derive that
where
Finally, we obtain the solution of System (
6) in the following form:
where probability
can be evaluated from the normalization condition as follows:
The stability condition of the system is equivalent to the conditions of the series convergence (to obtain the condition for the series convergence, the d’Alembert test is used):
In this way, we proved that the two-dimensional distribution of the number of VMs in the first and second phases of the queueing system with service rate degradation is not factorizable opposite to classical queueing models without any degradation.
Total Number of Virtual Machines in the Node
Let us obtain the probability distribution of the total number of VMs
in the node. We derive the distribution using the following convolution:
where
is calculated from the normalization condition.
5. Numerical Examples
For the demonstration of the results as well as for estimating an accuracy of the obtained approximation, we consider an example of the queueing system with the Poisson arrival process, an unlimited number of servers and two phases of service. Service rate degradation is only in the first phase (i.e.,
) and depends on the number of VMs working in the first phase only (i.e.,
). Parameters for the example are the following:
where
T is variable (it is necessary for analysis of the applicability area of the asymptotic results).
We consider two examples of service rate degradation function in the first phase:
The forms of the degradation functions are presented in
Figure 4.
The results obtained in
Section 3 will serve us as standard on sets of the parameters satisfied conditions (
4). Comparing the exact solution (
7) with the asymptotic approximation (
26) on such sets of parameters, we can determine the asymptotic method application area. Condition (
4) is true for parameters defined above.
First of all, we present the exact two-dimensional probability distribution of the number of VMs in each phase in cases where
and
.
Figure 5 shows the results for the example with degradation function (
29a).
In this case, exact one-dimensional distributions of the number of VMs in each phase and the total number of VMs in the system have the forms shown in
Figure 6. Means and variance of the distributions are presented in
Table 1.
In
Figure 7, we show examples of the comparison of the asymptotic and the exact distributions for different values of
T. For making conclusions about the precision of the asymptotic result and its applicability area, we compare the exact and the asymptotic distributions of the total number of VMs for different values of parameter
T for both types of degradation function (
29). We use Kolmogorov distance as a measure of the difference between the distributions:
where
is the exact probability distribution;
is the corresponding asymptotic one. Values of the Kolmogorov distances are presented in
Table 2. Supposing that an error
is acceptable, we may conclude that the asymptotic formulas can be applied for values
. The advantage of the asymptotic method is the absence of conditions on the system parameters, so it can be applied for a wide area of values of the system parameters.
In addition, we show the dependency of the mean of the total number of VMs in the system on the following model parameters:
6. Discussion
In the paper, we have considered the queueing model with an unlimited number of servers and service rate degradation depending on the number of customers in the system. Such a model may be useful for modeling of systems where a growing number of customers leads to lower performance for each customer. For example, in a cloud node, when the number of executed virtual machines (VMs) grows, their individual performance decreases due to contention for shared resources (CPU cores, caches, memory, etc.). In addition, in the proposed model, customers may be served in two phases with different service parameters (including degradation functions) switching between the phases. This feature help us to model different behavior of VMs in different states as well as different requirements for resources in these states. For example, there is an active phase when a VM performs some executions (processing requests) and a passive phase when the VM is in a sleep mode or waiting for requests.
Studying the model, we derived the system of global balance Equation (
3) for the steady-state regime. The system cannot be solved directly. Therefore, we tried to solve it using an equivalence Condition (
4) between global and local balance solutions. Solutions (
7) and (
8) were derived but the equivalence condition seems very strong and does not allow one to apply the solution in a wide class of the systems.
Due to this fact, we applied the asymptotic analysis method to obtain an approximate solution of System (
3). The condition of growing service time is used for the asymptotic analysis. The approach similar to [
17,
18,
19] is used, but unlike the mentioned papers, here we performed a direct study of the probability distribution function instead of an analysis of characteristic functions. As a result, we derived two-dimensional Gaussian approximation (
26) of the distribution of the number of VMs in the phases of service.
Theoretically, the obtained approximation should become more precise when the service time grows (the asymptotic condition). So, we need to establish whether it works in such a way and estimate the approximation precision for different parameters. These results are presented in
Section 5. There we performed a numerical analysis of the obtained results by comparing approximation (
26) with exact solution (
7) and (
8) for the case when Condition (
4) is satisfied. Using the Kolmogorov distance as an error estimation, we found that the error decreases when the average service time grows. The visual presentation of the distributions in
Figure 7 confirms this conclusion.
Future studies may be devoted to considering similar models with the number of service phases greater than 2. Moreover, the problem of deriving requirements for degradation functions which ensure the existence of only a single solution of system (
16) is important. This problem may become dramatic when the number of phases and the number of degradation functions and their arguments increase greatly.