Timely Reliability Analysis of Virtual Machines Considering Migration and Recovery in an Edge Server

For the edge computing network, whether the end-to-end delay satisfies the delay constraint of the task is critical, especially for delay-sensitive tasks. Virtual machine (VM) migration improves the robustness of the network, whereas it also causes service downtime and increases the end-to-end delay. To study the influence of failure, migration, and recovery of VMs, we define three states for the VMs in an edge server and build a continuous-time Markov chain (CTMC). Then, we develop a matrix-geometric method and a first passage time method to obtain the VMs timely reliability (VTR) and the end-to-end timely reliability (ETR). The numerical results are verified by simulation based on OMNeT++. Results show that VTR is a monotonic function of the migration rate and the number of VMs. However, in some cases, the increase in task VMs (TVMs) may conversely decrease VTR, since more TVMs also brings about more failures in a given time. Moreover, we find that there is a trade-off between TVMs and backup VMs (BVMs) when the total number of VMs is limited. Our findings may shed light on understanding the impact of VM migration on end-to-end delay and designing a more reliable edge computing network for delay-sensitive applications.


Introduction
With the development of the Internet of Things (IoT), delay-sensitive computing tasks are increasing in cloud computing networks. Because of the long distance between the user equipment and the cloud, dealing with the IoT data in the cloud will cause unacceptable latency, especially for stream or real-time IoT data. To satisfy the strict delay requirements of such IoT services, edge computing emerged. Edge computing provides computing and storage resources at the edge of the network, so that latency-sensitive data and private data can be handled nearby the users, such as applications, e.g., driverless cars, real-time traffic management, virtual reality (VR), augmented reality (AR), and healthcare IoT [1,2]. Regardless of the application scenario, the success of the task relies on the powerful and reliable processing capability of edge computing servers, especially for latency-sensitive applications.
To achieve scalability and robustness, computing and storage resources in cloud servers or edge servers are usually managed by virtualization technology [3][4][5][6]. Computing tasks are performed by independent virtual machines (VMs). Owing to virtualization technology, the network can facilitate load balancing and maintain servers without terminating the task by live migration of VM [7,8]. More importantly, if a VM crashes or is about to crash, the memory image of this VM can be migrated to an idle one and restart the processing, improving fault tolerance of the system. According to [9][10][11], many software failures are transient, which means the system can reboot to repair the problem. If the VM failure is transient, the failed VM can reboot to recover in a short time after the migration. The processing capability of the system is thus recovered. However, as for latency-sensitive tasks, the virtualized management also creates new reliability challenges, which are the

Network Delay Analysis
As mentioned before, the end-to-end delay in the computer network consists of processing delay, transmission delay, propagation delay, and queueing delay [19]. We investigate the literature of the network delay analysis, especially on the queueing delay, and provide a comparative table mainly focusing on the analysis method as below. The main models and methods are described in the table. In Table 1, "Fixed queues" means that the structure of the queue models of a task is fixed. For example, in [24], the queue model is a serial queue consists of the queue of edge computing, the queue of cloud gateway, and the queue of cloud computing; "Stochastic queues" means that the structure of the queue models is not fixed, but determined by some specific mechanisms with probabilities. For example, in [33], tasks might be processed at the IoT layer, fog layer, or cloud layer with different probabilities; "Homogeneous servers" means that the queue servers in the model are all homogeneous, i.e., each server has the same service rate [34]; "Heterogeneous servers" means that the queue servers in the model are heterogeneous [29].
As for simulation methods, (1) research focused on the resource management model, such as task offloading and resource allocation, often verifies their delay analysis models by semi-physical simulation [27,33], or Monte Carlo discrete event simulation using tools such as CloudSim [31] and, Arena [29,30]. Average metrics such as mean wait time and mean queue length are calculated to measure the system performance. The influence of VMs state transitions on the timely reliability is not modeled. (2) End-to-end delay models of a network are often verified by Monte Carlo discrete event simulation using Java Modeling Tools (JMS) [24], OPNET [18], OMNeT++, etc. Timely reliability has been analyzed [18,32], but the influence of VM state transitions on the timely reliability is still not modeled.

Delay Feature Description
Transmission delay -Transmission delay is the ratio of the data size and the transmission rate. Factors such as task length, data size, and bandwidth are considered to obtain the transmission delay [31,[35][36][37].
Propagation delay -Propagation delay is the ratio of the distance and the propagation speed and is often treated as a function of distance. Factors such as distance [31,33] considered to obtain the propagation delay.
Processing delay -Processing delay depends on the computing capability of the server and the computing complex of the tasks. It is usually modeled by exponential distribution [24,25,29,31,33,35] or constant [18], etc., and acquired by combining with queue delay.

Queueing delay Fixed queues Homogeneous servers M/M/c
Model the two-stage queues inside a server [25] or multi-stage queues [24,27] of the edge/cloud computing network, consisting of end equipment, edge nodes, and cloud, etc. Queue theory is employed to obtain average metrics such as mean queue length and mean delay, etc. Virtual machines are considered as queue servers [24] and the modeled virtual machine (VM) states only include busy and idle. Server failure of the queue is not considered, and the probability density function (PDF) of end-to-end delay is not obtained.

Non-M/M/c
Consider tasks arriving with an arbitrary probability distribution [27] or service time following arbitrary probability distribution [30]. Average metrics such as mean queue length and mean delay, are obtained. Virtual machines are considered as queue servers [30] and the modeled VM states only include busy and idle. Server failure of the queue is not considered, and the PDF of end-to-end delay is not obtained.

Heterogeneous servers
Different resource requests lead to heterogeneous queueing servers. The situation where the task requires VMs with different numbers of cores is analyzed [29,38]. The modeled VM states only include busy and idle. The number of jobs in waiting, the number of jobs under provisioning, the number of busy cores, and the number of jobs in service, etc., are employed to define the state space. CTMC is employed to calculate the mean delay. Server failure of the queue is not considered, and the PDF of end-to-end delay is not obtained.

Stochastic queues
Build the queueing models with probabilities according to different offloading mechanisms [31,33]. M/M/c queue model is employed to calculate the mean delay. Server failure of the queue is not considered, and the PDF of end-to-end delay is not obtained.

VMs Timely Reliability Analysis
In the delay analysis of a single node, the main delays include processing delay and queueing delay. Queueing theory is widely used to analyze node delay. Usually, the processes in the computing server are modeled with multiple servers, including homogeneous servers [24,25] and heterogeneous servers [29,39]. An admission manager or load balancer is often modeled as a single server ahead of the computing servers [25,27]. For the task/data arrival, most studies described the task/data flow by Poisson distribution. Researchers then build solutions by constructing and solving a CTMC. For example, Kafhali and Salah [24] modeled the edge server by M/M/s/K queueing model. Pereira et al. [25] proposed a queueing model for a Fog node containing a load balancer and several web servers to solve a closed-form solution. Liu et al. [27] analyzed the user equipment (UE) side delay and server-side delay, respectively, from a perspective of task offloading and resource allocation. However, the above studies only proposed metrics such as utilization of VMs, average throughput, mean response time, average queue length, and discard rate. Measuring the above quantities fails to manifest whether the low-latency constraints of delay-sensitive applications are satisfied. To solve this problem, the PDF of sojourn time at the edge server needs to be obtained. Moreover, the above analyses based on queueing theory are all under the assumption that the edge node serves perfectly. However, the failure of a server or a VM can reduce the processing capability of the edge node directly and increase latency [12,40,41]. To model the failure and repair of VMs, VMs can be treated as unreliable and repairable queue servers based on the multi-server queueing models with unreliable and repairable servers [20][21][22][23]26]. For instance, Ke et al. [26] and Chakravarthy et al. [23] analyzed multiserver queueing models considering unreliable servers and vacation policy. Although the unreliable and repairable servers in these queueing models are similar to VMs of the edge node, the present multi-server queueing models cannot be applied to model the edge computing servers directly. The reason is that the VM needs to migrate memory images and states before its recovery, resulting in down states for the migration and recovery that cannot be described by present models.
To investigate the influence of VM migration on node delay, some researchers focused on the performance evaluation of VM migration [12,13,40,[42][43][44][45][46][47] and latency-aware VM migration algorithms [14,[48][49][50][51]. In these studies, migration time and service downtime [13] were measured to evaluate the performance of VM migration, yet how the migration time and service downtime influenced the sojourn time at the edge node has not been studied. For the latency analysis of edge computing, most studies focused on the delay of specific activities, such as processing, task offloading, and resource allocation [52]. The total sojourn time at the edge node has not been well investigated. Liu et al. [53] analyzed the influence of migration, but they only studied the probability that the computing system can provide sufficient VMs. The delay at the server was not analyzed.
All this considered, present studies on VM delay did not investigate the influence of VM migration on the total delay at the VMs, for the queueing models of VMs ignored the failure and migration of VMs [24,25,27], and the VM migration delay was measured separately from the total delay [12][13][14]42,[44][45][46][47]53]. Meanwhile, due to the unique process of failure, migration, and recovery of VMs, present queueing models with unreliable servers [20][21][22][23]26] cannot be directly applied to analyze the VM delay. To analyze the timely reliability of the VMs and the network, we build a queueing model considering the whole process of failure, migration, and recovery of VMs and propose a method to obtain the PDF of the delay at the edge server.

End-to-End Timely Reliability Analysis
There are two types of serial queueing models for the analysis of end-to-end delay, which are analytic models and approximate models. By traversal of the end-to-end topology, the end-to-end delay is obtained based on the delay in each node. Some researchers analyzed the situation of two queuing nodes, i.e., tandem queue [54][55][56][57][58]. However, due to the mathematical complexity, the formulation of the networked queueing model is infeasible when the queueing nodes are more than three. Therefore, approximation methods such as simulation [18,59] and the upper bound method [60] are widely applied. Researchers also studied the end-to-end delay of queueing networks based on the departure process approximation. For instance, Xie and Haenggi [28] defined a parameter to measure the spatial correlation between interfacing queueing nodes based on the departure process approximation. Although Kafhali and Salah [24] proposed a serial M/M/s/K queueing model for an edge-cloud computing system based on Burke's theorem, they did not analyze the end-to-end delay in the model. In addition, Jie Shen et al. [61] proved that the end-toend delay distribution of a network system is the inverse Laplace transform of the transfer function of the signal flow graph and proposed an end-to-end delay analysis method based on frequency domain. The end-to-end delay analysis considering the structure of the edge computing network is still lacking.

Problem Description
As depicted by Figure 1, the edge computing network is composed of four partssensor nodes, the edge server, the cloud gateway, and the cloud server. The task data generated by sensors are transmitted to the edge server for necessary processing by VMs, including pre-processing, analysis, compression, and encryption, over Bluetooth Low Energy, or Wi-Fi. After the processing at the VMs of the edge server, a part of tasks will be transmitted to the cloud server by the cloud gateway for global storage, power computing, or running various applications. Usually, a task expects to be processed within an extremely short time at the edge server to guarantee the real-time application. Differently, the cloud server can process a task in hours or days. Therefore, the delay at the VMs in the edge server and the delay through the whole network need to be analyzed, respectively. For the edge service, a task only needs to queue at the edge server. Whereas, for the cloud service, a task needs to go through three queues, which are the queue at the edge server, the queue at the cloud gateway, and the queue at the cloud server successively. According to the delay analysis, the definition of timely reliability is given by [18]: where D is the total delay of a task, D max is the maximum allowable delay of the task, f (t) is the PDF of the delay.
PDF of the delay at the edge server.

End-to-End Timely Reliability Analysis
There are two types of serial queueing models for the analysis of end-to-end delay, which are analytic models and approximate models. By traversal of the end-to-end topology, the end-to-end delay is obtained based on the delay in each node. Some researchers analyzed the situation of two queuing nodes, i.e., tandem queue [54][55][56][57][58]. However, due to the mathematical complexity, the formulation of the networked queueing model is infeasible when the queueing nodes are more than three. Therefore, approximation methods such as simulation [18,59] and the upper bound method [60] are widely applied. Researchers also studied the end-to-end delay of queueing networks based on the departure process approximation. For instance, Xie and Haenggi [28] defined a parameter to measure the spatial correlation between interfacing queueing nodes based on the departure process approximation. Although Kafhali and Salah [24] proposed a serial M/M/s/K queueing model for an edge-cloud computing system based on Burke's theorem, they did not analyze the end-to-end delay in the model. In addition, Jie Shen et al. [61] proved that the end-to-end delay distribution of a network system is the inverse Laplace transform of the transfer function of the signal flow graph and proposed an end-to-end delay analysis method based on frequency domain. The end-to-end delay analysis considering the structure of the edge computing network is still lacking.

Problem Description
As depicted by Figure 1, the edge computing network is composed of four partssensor nodes, the edge server, the cloud gateway, and the cloud server. The task data generated by sensors are transmitted to the edge server for necessary processing by VMs, including pre-processing, analysis, compression, and encryption, over Bluetooth Low Energy, or Wi-Fi. After the processing at the VMs of the edge server, a part of tasks will be transmitted to the cloud server by the cloud gateway for global storage, power computing, or running various applications. Usually, a task expects to be processed within an extremely short time at the edge server to guarantee the real-time application. Differently, the cloud server can process a task in hours or days. Therefore, the delay at the VMs in the edge server and the delay through the whole network need to be analyzed, respectively. For the edge service, a task only needs to queue at the edge server. Whereas, for the cloud service, a task needs to go through three queues, which are the queue at the edge server, the queue at the cloud gateway, and the queue at the cloud server successively. According to the delay analysis, the definition of timely reliability is given by [18]: where is the total delay of a task, is the maximum allowable delay of the task, ( ) is the PDF of the delay.

Analysis of Delay at the VMs in the Edge Server
In an application, the task data need to be transmitted to the edge server and be processed by VMs within a delay constraint of D max1 . In the edge server, S VMs are assigned to process tasks, which are called task VMs (TVMs), and d idle VMs are assigned as backups, which are called backup VMs (BVM). All the VMs are homogeneous. Then, the total number of VMs is Z = S + d. The task flow is generated following a Poisson process at each sensor, where the arrival rate is λ. The service time of each VM follows an exponential distribution where the mean service time is 1/µ. If the S VMs are all occupied, the task can be stored in the buffer, whose capacity is K. Then, the maximum capacity of the edge server accommodating tasks is Y = S + d + K. If the buffer is full as well, a discard occurs. We assume that the discarded tasks all violate the delay constraints and do not consider the retrial queue.
When a VM fails, the task will be migrated to an idle VM and restart the processing, as shown in Figure 2. The migration downtime follows an exponential distribution where the mean service time is 1/α. Note that if all the other VMs are occupied or failed, the task will stay in the failed VM and wait for an idle VM to migrate. Extremely, when all the VMs are failed and not ready to reboot, to avoid deadlock of the VMs, the last failed VM can reboot without migration, reserving the un-migrated task. After the migration of the task, the failed VM can reboot and become an available VM again. If there are fewer than S VMs available (capable to process a task), the newly recovered VM will be assigned as a TVM; otherwise, it will be assigned as a BVM. The reboot time of VM follows an exponential distribution where the mean reboot time is 1/β. Then, each VM has three states: available, failed-unmigrated, and failed-migrated. Here, the available state refers to the state that the VM is capable to process a task. An available VM can be either idle or occupied. Transitions among the three states are depicted in Figure 3: (1) From 1 to 2 , the running VM fails, transiting from available to failed-unmigrated. (2) If there is at least one idle VM, the migration begins, or the failed VM needs to wait for an idle VM to migrate. When the migration completes, the edge server transits from 2 to 3 , where the failed VM is failed-migrated. (3) By rebooting, the failed-migrated VM can recover to be available again, edge server transiting from 3 to 1 . If there are fewer than S VMs running, the newly recovered VM will be assigned as a TVM, or it will be assigned as a BVM. (4) At 1 , when the TVMs are all occupied, the new arrival task will be waiting in the buffer, edge server transiting from 1 to 4 . If the buffer is full as well, a discard occurs. (5) At 4 , if a VM completes the processing of a task and becomes an idle VM, the task waiting in the buffer will get into the idle VM for processing, edge server transiting from 4 to 1 .
In the edge computing server described above, the processing of tasks can be treated as a revised M/M/s/K queue, where the VM migration and recovery are considered. Assumptions: • All the tasks are equal in data size; • The failures of VM are all transient failures, which can be recovered by rebooting in a short time; • Idle VMs do not fail; • The working times of VMs in the server are independent and identically distributed with the exponential distribution where 1/θ is the mean time to failure.

Analysis of End-to-End Delay
After the processing at the edge server, a part of task data needs to be transmitted to the cloud for further processing or storage within a delay constraint of . According to Burke's theorem [62], the steady-state departure process of a stable M/M/c queueing system is a Poisson process with the same rate as the arrival process [63]. Assume that the service time of the cloud gateway and the cloud server follow exponential distributions where the mean service time is 1 ⁄ and 1 , ⁄ respectively. Then, the queues at the cloud gateway and the cloud server can be modeled by the M/M/1/K queueing model [64]. As depicted in Figure 4, the tasks are processed at the edge server firstly. Then, a portion of tasks is transmitted to a cloud server by a cloud gateway for further processing. The delay is caused by the queueing at the edge server, cloud gateway, and cloud server successively.

Analysis of End-to-End Delay
After the processing at the edge server, a part of task data needs to be transmitted to the cloud for further processing or storage within a delay constraint of D max2 . According to Burke's theorem [62], the steady-state departure process of a stable M/M/c queueing system is a Poisson process with the same rate as the arrival process [63]. Assume that the service time of the cloud gateway and the cloud server follow exponential distributions where the mean service time is 1/µ lg and 1/µ cs , respectively. Then, the queues at the cloud gateway and the cloud server can be modeled by the M/M/1/K queueing model [64]. As depicted in Figure 4, the tasks are processed at the edge server firstly. Then, a portion p of tasks is transmitted to a cloud server by a cloud gateway for further processing. The delay is caused by the queueing at the edge server, cloud gateway, and cloud server successively.  In the problem, we consider different delay constraints to obtain VTR and ETR of the edge computing network, according to the parameters of the network shown in Table 2.  In the problem, we consider different delay constraints to obtain VTR and ETR of the edge computing network, according to the parameters of the network shown in Table 2. The w-th block of the CTMC L w Set of all the states in row w of the CTMC

Queueing Model of the Edge Server
To model the queueing at the VMs in the edge server described in the previous section, we denote the state of the edge server by (x 1 , x 2 , x 3 ), where integer x 1 ∈ [0, Z] denotes the number of available VMs, integer x 2 ∈ [0, Y] denotes the number of tasks in the edge server, and integer x 3 ∈ [0, min(x 2 , Z)] denotes the number of failed-unmigrated VMs. Since x 1 , x 2 , and x 3 are all limited, the number of states is finite. According to Section 3.1, x 1 , x 3 , and (x 1 + x 3 ) shall all be less than Z. In addition, the number of tasks x 2 shall not be more than the present capacity of the edge server [min(x 1 , S) + x 3 + K]. Furthermore, the number of tasks x 3 reaches its maximum when all the tasks stay in failed-unmigrated VMs, where the maximum value is x 2 . According to the analysis above, the constraints of the three state variables are: where Z denotes the total number of VMs, S denotes the number of TVMs, K denotes the capacity of the buffer of the edge server. We assume that the states violating Equation (2) exist in the CTMC and name them as pseudo-states. The transitions of pseudo-states are the same as the real states. Furthermore, to study the state transitions and express the transition rate diagram conveniently, we divide the states into a few blocks in terms of x 2 , i.e., the states that have the same value of x 2 belong to the same block. Then the total number of blocks is (Y + 1), for x 2 ranges from 0 to Y. The j-th block B j : The number of columns of B j . is: Let b j denote the total number of columns from block B 1 to block B j . Then, b j is given by: Consequently, the transitions inside a block are all triggered by state transitions of VM (failure, migration, and reboot). At the same time, transitions between blocks are triggered by the arrival or departure of tasks. Figure 6 presents the transitions between blocks when = − + 1. The presented two states in each block are the first two states of the block. When a task arrives, the -th state of block transits to the -th state of block with a transition rate of ; when a task departs, the -th state of block transits to the -th state of block with a transition rate of . Note that the transitions inside the blocks and the transitions between the blocks all exist in the CTMC of the edge server. We detach the transitions into two types for a concise and explicit description. Figure 7 presents the states of a case where = 1, = 1, = 1. (1) Transitions caused by failures of VMs. The number of VMs about to fail equals the number of running VMs, which is the minimum of the number of available TVMs min(x 1 , S) and the number of live tasks (x 2 − x 3 ). Thus, from state (x 1 , x 2 , x 3 ) to state (x 1 − 1, x 2 , x 3 + 1), the transition rate is gθ, where g is the number of about-to-fail VMs (or the number of running VMs), given by: Notably, the transition of migration does not exist when x 1 = 0, since none VM is available.
(2) Transitions caused by migrations of failed VMs. The number of migrating VMs is the minimum of the number of failed-unmigrated VMs x 3 and the number of idle VMs (x 1 − g). Thus, from state (x 1 , x 2 , x 3 ) to state (x 1 , x 2 , x 3 − 1), the transition rate is hα, where h is the number of migrating VMs, given by: (3) Transitions caused by the reboot of failed-migrated VMs. The number of rebooting VMs is the number of failed-migrated VMs At the same time, transitions between blocks are triggered by the arrival or departure of tasks. Figure 6 presents the transitions between blocks when x 1 = Z − i + 1. The pre-sented two states in each block are the first two states of the block. When a task arrives, the k-th state of block B j transits to the k-th state of block B j+1 with a transition rate of λ; when a task departs, the k-th state of block B j transits to the k-th state of block B j−1 with a transition rate of gµ. At the same time, transitions between blocks are triggered by the arrival or departure of tasks. Figure 6 presents the transitions between blocks when = − + 1. The presented two states in each block are the first two states of the block. When a task arrives, the -th state of block transits to the -th state of block with a transition rate of ; when a task departs, the -th state of block transits to the -th state of block with a transition rate of . Note that the transitions inside the blocks and the transitions between the blocks all exist in the CTMC of the edge server. We detach the transitions into two types for a concise and explicit description. Figure 7 presents the states of a case where = 1, = 1, = 1.  Note that the transitions inside the blocks and the transitions between the blocks all exist in the CTMC of the edge server. We detach the transitions into two types for a concise and explicit description. Figure 7 presents the states of a case where S = 1, d = 1, K = 1. At the same time, transitions between blocks are triggered by the arrival or departure of tasks. Figure 6 presents the transitions between blocks when = − + 1. The presented two states in each block are the first two states of the block. When a task arrives, the -th state of block transits to the -th state of block with a transition rate of ; when a task departs, the -th state of block transits to the -th state of block with a transition rate of . Note that the transitions inside the blocks and the transitions between the blocks all exist in the CTMC of the edge server. We detach the transitions into two types for a concise and explicit description. Figure 7 presents the states of a case where = 1, = 1, = 1.  Given the partitioned configuration of the CTMC, to locate a state more easily, we define the coordinate of a state as (i, j, k), where i denotes the row number, j denotes the block number, and k denotes the column number inside the block. The derivation between a state and its coordinate is where x 1 denotes the number of available VMs, x 2 denotes the number of tasks in the edge server, x 3 denotes the number of failed-unmigrated VMs. According to Equations (2) and (8), the constraints of the three coordinate variables are: To avoid misunderstanding, we define the set of all the states in row i ∈ [1, Z + 1] of the CTMC as L i : Then, in L i , the number of states (i, j, k) is c = b j − a j + k, where b j denotes the total number of columns from block B 1 to block B j , a j denotes the number of columns of the block B j . Since the number of states is numerous, it is hard to give the expression of transition rate for each state if we directly construct the generator matrix according to the sequence of states. Therefore, we construct the generator matrix of the CTMC based on the matrix-geometric method. Specifically, we obtain the generator matrices of transition inside a row L i (transitions caused by arrival and departure of tasks and migration of VMs) and generator matrices of transitions between neighboring rows (transitions caused by failure and reboot of VMs), respectively. In this way, we can express the generator matrix as a partitioned matrix with quasi-birth-and-death feature: where matrix A xy (b Y+1 ×b Y+1 ) denotes the generator matrix from L x to L y . Then, the element A xy (c 1 , c 2 ) in row c 1 and column c 2 of matrix A xy denotes the transition rate from the c 1 -th state of L x to the c 2 -th state of L y . The general form of A xy is given by: where g is the number of running VMs, h is the number of migrating VMs. Identify the pseudo-states according to Equation (2). Then, delete the rows and columns of the pseudo-states in A and name the remaining l × l matrix as Q. After that, revise the diagonal elements of Q to subject to: The matrix Q is the final form of the generator matrix of the CTMC.
To obtain the PDF of sojourn time at the edge server, we employ the first passage time method. Among all the states of the edge computing server, the states in block 1 represent that the edge server is empty, for the numbers of the tasks are 0. Delete the states in block 1. Then, the sojourn time at the edge server approximately equals the time that the edge server transits from the non-empty states to the empty states when no new task arrives, which can be obtained by the first passage time method. Obtaining the PDF of sojourn time at the edge server follows the steps below: 1. Solve Equation (14) to obtain the steady-state probability vector π of the CTMC: 2. Delete the empty states in the transition rate matrix A, and retain the rest states. The new matrix is named as M.
3. Delete the probability of empty states in the stable probability vector π. The new vector is named as γ. Let δ = γ/ ∑ γ.
5. Then PDF of the sojourn time D is:

Timely Reliability Model
As analyzed in Section 3, the queueing at the cloud gateway and cloud server can be modeled by the M/M/1/C model. The tasks go through the queue of the edge server, the queue of cloud gateway, and the queue of the cloud server successively. Then, the end-to-end delay is determined by the joint distribution of the successive delays of a task traversing edge server, cloud gateway, and cloud server. The PDF of delay at the VMs in the edge server has been analyzed in Section 4.1. In the rest of Section 4.2, we present the analysis method of end-to-end delay and timely reliability, including VTR and ETR.

VMs Timely Reliability
VMs timely reliability (VTR) is defined as the probability that the VMs in an edge server are capable of processing a task within a required time. For tasks requiring edge service, the service delay equals the sojourn time at the edge server. According to Equation (1), VTR is: where D max1 is the maximum allowable delay at the edge server.

End-to-End Timely Reliability
End-to-end timely reliability (ETR) is defined as the probability that a group of end-toend nodes in the edge computing network is capable of processing a task within a required time. For tasks requiring cloud service, the service delay is the sum of delay at the edge server, cloud gateway, and cloud. Since only a portion p of tasks is transmitted to the cloud server, the arrival rate of tasks at the cloud gateway and the cloud server is pNλ. Departing from the edge server, a task arrives at the cloud gateway finding w tasks (0 ≤ w ≤ C lg ) already there with a probability [24]: where ρ lg = pNλ/µ lg is the traffic intensity at the cloud gateway. The processing time of a task at the cloud gateway follows exponential distribution: Then, the sojourn time of a task is the joint distribution of the processing time of all the (w + 1) tasks (the former w tasks and itself). Since the sum of identically and exponentially distributed variables follow a Gamma distribution, the PDF of the sojourn time of a task at the cloud gateway if it arrives at the cloud gateway finding w tasks already there is: Therefore, the PDF of the sojourn time at the cloud gateway: Similarly, departing from the cloud gateway, a task arrives at the cloud server finding w tasks (0 ≤ w ≤ C cs ) already there with a probability: where ρ cs = pNλ/µ cs is the traffic intensity at the cloud server. The sojourn time of a task at the cloud server if it arrives at the cloud server finding w tasks already there is: The PDF of the sojourn time at the cloud server: According to the above analysis, PDF of the end-to-end delay of the edge computing network can be obtained by: where t 1 , t 2 , and t 3 are the sojourn time at the edge server, the total time to go through the edge server, and the cloud gateway, and the total time to go through the edge server, the cloud gateway, and the cloud server, respectively. According to Equation (1), ETR can be obtained by:

Algorithm
To calculate VTR, the key is to solve the CTMC with a state-space of high dimension and obtain the PDF of the sojourn time at the edge server. For this purpose, we need to propose an algorithm to formulate the partitioned generator matrix and obtain the PDF of the sojourn time at the edge server based on the first passage time method. The obtaining of PDF of the sojourn time at the edge server is described in Algorithm 1. After that, VTR and ETR can be calculated according to Section 4.2.
Algorithm 1 Algorithm for obtaining PDF of the sojourn time at the edge server

Input:
Arrival rate of tasks: PDF of the sojourn time at the edge server: f es (t) 1. initialize A 2. vector a = min(j, Z), 1 ≤ j ≤ Y + 1 % The number of columns of block j 3. vector b = ∑ j u=1 a u , 1 ≤ j ≤ Y + 1 % The number of total columns from block 1 to block j 4. for i = 1: Z + 1 do % Transitions caused by arrival or departure of tasks 5.
end for 13. end for 14. for i = 1: Z + 1 do % Transitions caused by migration of VMs 15.
end for 28. end for 29. end for 30. for j = 1: Y + 1 do % Transitions caused by reboot of VMs 31.
for i = 2: Z + 1 do 33. x end for 36. end for 37. end for 38. for j = Z + 1: Y + 1 do % Transitions caused by reboot without migration when all the VMs are failed-unmigrated 39.
for i = 1: replace row x of Q with 0 50.
replace column x of Q with 0 51. end if 52. end for 53. end for 54. end for 55. replace the rows and columns of absorbing states with 0 56. record the row number (or column number) of the states whose value is 0 57. revise the diagonal elements of Q to ensure the sum of each row equal to 0 58. delete the rows and columns whose value is 0 59. the steady-state probability vector: π = Q/e %e is a column vector whose last element is 1, and the other elements are 0 60. M = A 61. set the arrival rate to 0 in M 62. revise the diagonal elements of M to ensure the sum of each row equals to 0 63. delete the rows and columns of the states of block 1 % The states of block 1 represent that the edge server is empty 64. delete the rows and columns that equals to 0 65. delete the probability of empty states in the stable probability vector π and name the new vector as γ 66. let δ = γ/ ∑ γ 67. F es (t) = 1 − δ· exp(Mt)·1 68. calculate the PDF of sojourn time at the edge server: f es (t) = F es (t) 69. return f es (t)

Results and Discussion
In this section, we provide a case study of traffic monitoring for the proposed timely reliability evaluation method. Simulation results are obtained to verify our model and numerical results. We also investigate the influence of different factors on timely reliability, which helps design a better edge computing network.

Experimental Setup
At a crossroads, the edge computing network for traffic monitoring consists of several sensors, an edge server, a cloud gateway, and a cloud server. The architecture of this network is depicted in Figure 1. To monitor the traffic at this crossroads, the sensors collect task data in real-time and transmit the tasks to the edge server, where the tasks are processed by VMs. In addition, a part of processed tasks is transmitted to the cloud server through the cloud gateway for further process or storage. The parameters used in this case are shown in Table 3. The data on sensor data refer to [65,66]. The data on VM migration refer to [40]. The data on system failure refer to [10]. Maximum allowable delay at the edge server 0.04 s D max2 Maximum allowable end-to-end delay 0.04 s

Simulation Experiments
The numerical results are verified by simulation based on OMNeT++ (V5.5.1), a C++ network simulation library and framework. We choose OMNeT++ for its open source and extensible features. Figure 8 shows the simulation model in OMNeT++. In the simulation model, the sensors are modeled by the Wireless Host Module; the buffer of the edge server is modeled by the Passive Queue Module; the VMs of the edge server are modeled by the Server Module; the cloud gateway is modeled by the Passive Queue Module; the cloud server is modeled by the Cloud Module. Particularly, to model the migration time of the VM migration, a Downtime Generator (DG), modeled by the Passive Queue Module, is employed to connect the edge servers and VMs. Once a running VM fails, the task on this VM will be migrated to an idle VM through the DG, where a migration downtime of an exponential distribution is needed. Notably, the tasks spend no time to pass the DG unless it is on the way of migration. As soon as the migrating task arrives at the destination VM, the failed VM becomes failed-migrated, and its rebooting can begin. The values of the parameters are the same as those in Table 3.

Simulation Experiments
The numerical results are verified by simulation based on OMNeT++ (V5.5.1), a C++ network simulation library and framework. We choose OMNeT++ for its open source and extensible features. Figure 8 shows the simulation model in OMNeT++. In the simulation model, the sensors are modeled by the Wireless Host Module; the buffer of the edge server is modeled by the Passive Queue Module; the VMs of the edge server are modeled by the Server Module; the cloud gateway is modeled by the Passive Queue Module; the cloud server is modeled by the Cloud Module. Particularly, to model the migration time of the VM migration, a Downtime Generator (DG), modeled by the Passive Queue Module, is employed to connect the edge servers and VMs. Once a running VM fails, the task on this VM will be migrated to an idle VM through the DG, where a migration downtime of an exponential distribution is needed. Notably, the tasks spend no time to pass the DG unless it is on the way of migration. As soon as the migrating task arrives at the destination VM, the failed VM becomes failed-migrated, and its rebooting can begin. The values of the parameters are the same as those in Table 3. In the simulation, the end-to-end delay converges to a steady-state value as the simulation time goes by, as shown in Figure 9. We collect the delay values of each 1000 tasks to calculate ETR. When the variation of ETR is less than 1%, we take the final value as the stationary simulation result. Let the Boolean variable denote if the end-to-end delay of a task satisfies the task requirement: = 0, the delay does not satisfy the requirement 1, the delay satisfies the requirement .
Then ETR can be obtained by: In the simulation, it takes 5 min for the end-to-end delay to converge. To obtain the steady-state value of ETR, the simulation takes more than 10 min. We compare the simulation results and the numerical results of ETR when the failure rate equals to 0.0002, 0.0006, and 0.001 with different numbers of sensors. The comparisons are shown in Figure  10, where the bars' heights represent the relative errors of the numerical results. The relative errors are all less than 2%. The simulation results verify that the analytical model In the simulation, the end-to-end delay converges to a steady-state value as the simulation time goes by, as shown in Figure 9. We collect the delay values of each 1000 tasks to calculate ETR. When the variation of ETR is less than 1%, we take the final value as the stationary simulation result. Let the Boolean variable F i denote if the end-to-end delay of a task satisfies the task requirement: F i = 0, the delay does not satisfy the requirement D max2 1, the delay satisfies the requirement D max2 .
Then ETR can be obtained by: In the simulation, it takes 5 min for the end-to-end delay to converge. To obtain the steady-state value of ETR, the simulation takes more than 10 min. We compare the simulation results and the numerical results of ETR when the failure rate θ equals to 0.0002, 0.0006, and 0.001 with different numbers of sensors. The comparisons are shown in Figure 10, where the bars' heights represent the relative errors of the numerical results. The relative errors are all less than 2%. The simulation results verify that the analytical model proposed in this paper is valid and applicable for timely reliability analysis of a computing network considering failure, migration, and reboot of edge server VMs. proposed in this paper is valid and applicable for timely reliability analysis of a computing network considering failure, migration, and reboot of edge server VMs.

Timely Reliability Analysis
To investigate the impacts of different factors on the timely reliability of the VMs, we compare VTR under different conditions based on the analysis methods we proposed.

Analysis of Failure Rate, Migration Rate, and Reboot Rate
It is observed in Figures 11 and 12 that VTR is a monotonic function of the failure rate, migration rate, and reboot rate of VM. Comparing Figure 11a with Figure 11b, we find VTR is more sensitive to the failure rate when the reboot rate and migration rate are smaller since the VMs spend more time to migrate and recover when the reboot rate and migration rate are smaller. Thus, if the migration time and reboot time are quite short, the designers do not need to be very rigid on the reliability of VMs. Conversely, if the migration time and reboot time are relatively long, designers can improve VTR of the edge computing network effectively by reducing the failure rate of VMs. proposed in this paper is valid and applicable for timely reliability analysis of a computing network considering failure, migration, and reboot of edge server VMs.

Timely Reliability Analysis
To investigate the impacts of different factors on the timely reliability of the VMs, we compare VTR under different conditions based on the analysis methods we proposed.

Analysis of Failure Rate, Migration Rate, and Reboot Rate
It is observed in Figures 11 and 12 that VTR is a monotonic function of the failure rate, migration rate, and reboot rate of VM. Comparing Figure 11a with Figure 11b, we find VTR is more sensitive to the failure rate when the reboot rate and migration rate are smaller since the VMs spend more time to migrate and recover when the reboot rate and migration rate are smaller. Thus, if the migration time and reboot time are quite short, the designers do not need to be very rigid on the reliability of VMs. Conversely, if the migration time and reboot time are relatively long, designers can improve VTR of the edge computing network effectively by reducing the failure rate of VMs.

Timely Reliability Analysis
To investigate the impacts of different factors on the timely reliability of the VMs, we compare VTR under different conditions based on the analysis methods we proposed.

Analysis of Failure Rate, Migration Rate, and Reboot Rate
It is observed in Figures 11 and 12 that VTR is a monotonic function of the failure rate, migration rate, and reboot rate of VM. Comparing Figure 11a with Figure 11b, we find VTR is more sensitive to the failure rate when the reboot rate and migration rate are smaller since the VMs spend more time to migrate and recover when the reboot rate and migration rate are smaller. Thus, if the migration time and reboot time are quite short, the designers do not need to be very rigid on the reliability of VMs. Conversely, if the migration time and reboot time are relatively long, designers can improve VTR of the edge computing network effectively by reducing the failure rate of VMs.

Analysis of the Number of VMs
The number of TVMs and BVMs are also influential factors of VTR. Results in Figure  13a show that VTR is higher with the increase in the number of TVMs when the failure rate is high. The reason is that more VMs mean the server can process more tasks at the same time. However, when the failure rate is low, we have an interesting finding that VTR monotonically decreases as the number of TVMs grows, such as the cases when the failure rate equals 0.0002 or 0.0004 in Figure 13a. In fact, the increase in the number of TVMs influences VTR by two means, i.e., (1) reducing the end-to-end delay by processing more tasks at the same time and (2) increasing the end-to-end delay by bringing about more failure, migration, and reboot of VMs in a given time. The second impact is more significant when the failure rate is high. Whereas when the failure rate is low, the first impact is more significant. Similar results are shown by Figure 13b, where with no BVM or extremely few BVMs, VTR monotonically increases as the number of TVMs grows, indicating that the first impact is more significant. However, with more BVMs, VTR monotonically decreases as the number of TVMs grows, indicating that the second impact is more significant. Therefore, the increase in TVMs is not always a good choice for improving VTR.
According to the above findings, the resource manager can allocate an appropriate number of TVMs according to the above-mentioned analysis, which can be a guidance for the initial configuration of VMs. As for the number of BVMs, Figure 14 shows that VTR monotonically increases as the number of BVM grows.    Table 3.

Analysis of the Number of VMs
The number of TVMs and BVMs are also influential factors of VTR. Results in Figure  13a show that VTR is higher with the increase in the number of TVMs when the failure rate is high. The reason is that more VMs mean the server can process more tasks at the same time. However, when the failure rate is low, we have an interesting finding that VTR monotonically decreases as the number of TVMs grows, such as the cases when the failure rate equals 0.0002 or 0.0004 in Figure 13a. In fact, the increase in the number of TVMs influences VTR by two means, i.e., (1) reducing the end-to-end delay by processing more tasks at the same time and (2) increasing the end-to-end delay by bringing about more failure, migration, and reboot of VMs in a given time. The second impact is more significant when the failure rate is high. Whereas when the failure rate is low, the first impact is more significant. Similar results are shown by Figure 13b, where with no BVM or extremely few BVMs, VTR monotonically increases as the number of TVMs grows, indicating that the first impact is more significant. However, with more BVMs, VTR monotonically decreases as the number of TVMs grows, indicating that the second impact is more significant. Therefore, the increase in TVMs is not always a good choice for improving VTR.
According to the above findings, the resource manager can allocate an appropriate number of TVMs according to the above-mentioned analysis, which can be a guidance for the initial configuration of VMs. As for the number of BVMs, Figure 14 shows that VTR monotonically increases as the number of BVM grows.    Table 3.

Analysis of the Number of VMs
The number of TVMs and BVMs are also influential factors of VTR. Results in Figure  13a show that VTR is higher with the increase in the number of TVMs when the failure rate is high. The reason is that more VMs mean the server can process more tasks at the same time. However, when the failure rate is low, we have an interesting finding that VTR monotonically decreases as the number of TVMs grows, such as the cases when the failure rate equals 0.0002 or 0.0004 in Figure 13a. In fact, the increase in the number of TVMs influences VTR by two means, i.e., (1) reducing the end-to-end delay by processing more tasks at the same time and (2) increasing the end-to-end delay by bringing about more failure, migration, and reboot of VMs in a given time. The second impact is more significant when the failure rate is high. Whereas when the failure rate is low, the first impact is more significant. Similar results are shown by Figure 13b, where with no BVM or extremely few BVMs, VTR monotonically increases as the number of TVMs grows, indicating that the first impact is more significant. However, with more BVMs, VTR monotonically decreases as the number of TVMs grows, indicating that the second impact is more significant. Therefore, the increase in TVMs is not always a good choice for improving VTR.   Table 3.

Management of VMs
Since the increase in the number of TVMs and the number of BVMs both improve VTR generally, to achieve the highest VTR, there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited. Figure 15a reveals that when the failure rate of VM is low, only assigning one VM as TVM is the best solution. Figure 15b shows that when the failure rate is high, more VMs should be assigned as TVMs to achieve the highest VTR. Therefore, to achieve the highest VTR, based on the analysis method of this paper, the resource manager can find the optimal allocation of VMs according to the related parameters of the edge computing network and the tasks.    Table 3.
According to the above findings, the resource manager can allocate an appropriate number of TVMs according to the above-mentioned analysis, which can be a guidance for the initial configuration of VMs. As for the number of BVMs, Figure 14 shows that VTR monotonically increases as the number of BVM grows.   Table 3.

Management of VMs
Since the increase in the number of TVMs and the number of BVMs both improve VTR generally, to achieve the highest VTR, there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited. Figure 15a reveals that when the failure rate of VM is low, only assigning one VM as TVM is the best solution. Figure 15b shows that when the failure rate is high, more VMs should be assigned as TVMs to achieve the highest VTR. Therefore, to achieve the highest VTR, based on the analysis method of this paper, the resource manager can find the optimal allocation of VMs according to the related parameters of the edge computing network and the tasks.    Table 3.

Management of VMs
Since the increase in the number of TVMs and the number of BVMs both improve VTR generally, to achieve the highest VTR, there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited. Figure 15a reveals that when the failure rate of VM is low, only assigning one VM as TVM is the best solution. Figure 15b shows that when the failure rate is high, more VMs should be assigned as TVMs to achieve the highest VTR. Therefore, to achieve the highest VTR, based on the analysis method of this paper, the resource manager can find the optimal allocation of VMs according to the related parameters of the edge computing network and the tasks.  Table 3.

Section Summary
In this section, we provide a case study of traffic monitoring for the proposed timely reliability evaluation method and introduce the simulation method based on OMNeT++. Simulation results verify that the analytical model proposed in this paper is valid and applicable for timely reliability analysis of VMs of an edge server, considering failure, migration, and reboot of VMs.
We also investigate the influence of different factors on timely reliability. Results show that: (1) VTR is a monotonic function of the failure rate, migration rate, and reboot rate of VM and is more sensitive to the failure rate when the reboot rate and migration rate are smaller. (2) VTR monotonically increases as the number of BVM grows, but the increase in TVMs does not always improve VTR. (3) To achieve the highest VTR, there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited.

Conclusions
In this paper, we analyzed the VTR and the ETR of the edge computing network, considering the failure, migration, and reboot of VMs. To model the state transitions of VMs, we define three states of VMs (available, fail-unmigrated, fail-migrated) and construct a partitioned CTMC for the edge server. We express the generator matrix as a partitioned matrix based on the matrix-geometric method. An algorithm is proposed to resolve the CTMC and obtain the sojourn time at the edge server as well as VTR. Moreover, to analyze ETR of the network, we build the delay analysis model based on Burke's theorem and propose an end-to-end delay analysis method. The proposed method is more eligible to model the multi-state system of VMs. When the dimension of CTMC states is high, the constructing method of the generator matrix proposed in this paper can effectively reduce the difficulty of algorithm designing.
We investigate VTR of the edge computing network under different conditions. The numerical results are verified to be correct by simulation based on OMNeT++. Results show that VTR is a monotonic function of the migration rate, reboot rate of VM, the number of TVMs, and the number of BVMs. However, in some cases, the increase in TVMs may conversely decrease VTR, since more TVMs also bring about more failures in a given time. Moreover, we find that there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited. The resource manager can find the optimal allocation of VMs according to the migration rate, reboot rate, failure rate, etc., of the edge computing network and the arrive rate, computation complexity, etc., of the tasks. Based on the trade-off of TVMs and BVMs, in future research, we will further study   Table 3.

Section Summary
In this section, we provide a case study of traffic monitoring for the proposed timely reliability evaluation method and introduce the simulation method based on OMNeT++. Simulation results verify that the analytical model proposed in this paper is valid and applicable for timely reliability analysis of VMs of an edge server, considering failure, migration, and reboot of VMs.
We also investigate the influence of different factors on timely reliability. Results show that: (1) VTR is a monotonic function of the failure rate, migration rate, and reboot rate of VM and is more sensitive to the failure rate when the reboot rate and migration rate are smaller. (2) VTR monotonically increases as the number of BVM grows, but the increase in TVMs does not always improve VTR. (3) To achieve the highest VTR, there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited.

Conclusions
In this paper, we analyzed the VTR and the ETR of the edge computing network, considering the failure, migration, and reboot of VMs. To model the state transitions of VMs, we define three states of VMs (available, fail-unmigrated, fail-migrated) and construct a partitioned CTMC for the edge server. We express the generator matrix as a partitioned matrix based on the matrix-geometric method. An algorithm is proposed to resolve the CTMC and obtain the sojourn time at the edge server as well as VTR. Moreover, to analyze ETR of the network, we build the delay analysis model based on Burke's theorem and propose an end-to-end delay analysis method. The proposed method is more eligible to model the multi-state system of VMs. When the dimension of CTMC states is high, the constructing method of the generator matrix proposed in this paper can effectively reduce the difficulty of algorithm designing.
We investigate VTR of the edge computing network under different conditions. The numerical results are verified to be correct by simulation based on OMNeT++. Results show that VTR is a monotonic function of the migration rate, reboot rate of VM, the number of TVMs, and the number of BVMs. However, in some cases, the increase in TVMs may conversely decrease VTR, since more TVMs also bring about more failures in a given time. Moreover, we find that there is a trade-off between the number of TVMs and the number of BVMs when the total number of VMs is limited. The resource manager can find the optimal allocation of VMs according to the migration rate, reboot rate, failure rate, etc., of the edge computing network and the arrive rate, computation complexity, etc., of the tasks. Based on the trade-off of TVMs and BVMs, in future research, we will further study the optimization of computing resources to obtain a better performance of the edge server. We will also study the case when the sizes of task data are different in future work.