4.1. Hybrid Cloud
A hybrid cloud consists of an on-premises infrastructure, based on a private cloud, and resources acquired as-a-service from a public cloud provider. Among the various factors motivating a boost in the diffusion of these architectures are high security, controlled performance, large scalability, fast adoption of new technologies, and cost savings.
The problem of scalability in hybrid clouds is typically addressed through the dynamic provisioning of resources from the public cloud. The model presented in the following case study addresses this problem by implementing an algorithm that dynamically routes the requests to the public cloud when the load on the private component exceeds a threshold value.
4.1.1. Description of the Problem
This case study concerns a model of an IT infrastructure based on a hybrid cloud. More precisely, it focuses on modeling the process for dynamic resource provisioning. This method is able to acquire VMs on-demand from the public cloud when requests exceed the capacity set as a threshold on the private cloud. The multi-formalism model implemented consists of elements of Petri nets and queueing networks. The hybrid cloud scenario considered is represented in Figure 13
The incoming requests are processed by the user interface of the application and, after some formal and security checks, are sent to the load controller module. To satisfy the performance requirements, the software architecture of the app has been designed assuming that a dedicated VM of the local cloud is assigned to each request in execution. Since several highly fluctuating workloads share the resources of the private cloud, a limit is set on the maximum number of VMs dedicated to this application. When this threshold is reached, the new VMs are acquired on-demand from a public cloud.
The impact of this threshold on global response time and throughputs of both clouds need to be investigated. The objective of the study is the identification of the computational capacity, in terms of number of cores and power, of the servers of the private cloud that are required to satisfy the performance target with cost savings. In fact, VMs with the same computational power as a private provided by the public cloud are much more expensive. Therefore, to save costs, the VMs provided by the public cloud are much less powerful than the private ones. Thus, there is a tradeoff between the VMs provided by the private and public clouds, of performance and infrastructure costs.
4.1.2. Model Description
The layout of the model is shown in Figure 14
. The workload consists of two classes of customers: the incoming requests, representing the user demands of computation time, and the VMs (the tokens), representing the number of VMs available in the private cloud.
The JoinPrivate transition is enabled when a request arrives in Arriving place and there is at least one token available in MaxVM place. Each time the transition is activated, a VM of the private cloud is assigned to the request and the value of MaxVM is decreased by one. When this value is zero, the Inhibitor arc from MaxVM place and JoinPublic transition activate the latter and the request is addressed to the public cloud. When a request has been completely executed in the private cloud, the Rel transition routes it to Sink1 and a token is sent to the MaxVM place incrementing the number of VMs available.
The two clouds are represented by two delay stations since there is no competition for an available VM in both clouds. The arrival rate of the requests to be considered in the study is . This value has been assigned by the application designers since it is representative of a medium/high load that, according to the business plan, should be achieved in a year. The fluctuations of arrivals has been modeled with a of the hyper-exponential distribution of inter-arrival times. The mean service demands of the private cloud is 2.5 s while those of the public cloud is 7.5 s. The high variability of service times was considered assuming their distributions as hyper-exponential with . The time required by the processing of a request in the other infrastructure components, such as the user interface and the load controller, are negligible compared to the service demands of the VMs; therefore, small increases in their values have been considered.
4.1.3. Model Results
The behavior of the algorithm for dynamic provisioning of virtual machines is highlighted in Figure 15
that shows the trend of the number of VMs in execution in the two clouds, private (Figure 15
a) and public (Figure 15
b), in the interval 0–50 s.
The arrival rate is
= 50 req/s and the threshold of the VMs in the private cloud is 96. As can be seen in Figure 15
a, when the number of requests in execution in the private cloud is greater than 96, the provisioning of the new VMs is dynamically routed to the public cloud (see, e.g., the interval 45–50 s in Figure 15
The number of requests in execution in the two clouds as a function of the maximum number of VMs MaxVM
that can be provisioned in the private cloud is shown in Figure 16
. The range of MaxVM
evaluated is 16–128. The mean response time R
of the system as a function of the MaxVM
is depicted in Figure 17
With 96 VMs, the average response time is close to 4 s, a value that is considered acceptable as a performance target with the associated costs. As can be seen in Figure 16
, 87 VMs are in execution in the private cloud and 133 in the public cloud with MaxVM
The costs of the infrastructure can be evaluated considering the throughput of the two clouds as a function of MaxVM
(see Figure 18
4.2. Batching in IoT-Based Healthcare
The proliferation of IoT in the healthcare scenario has introduced new problems, which have been faced for effective use of their potential. Important benefits can be obtained in all areas of e-Health, in particular in those that use IoT integrated into information infrastructures enabling the use of ubiquitous computing technologies. Patients can be monitored anytime and anywhere, in either special hospital wards or remotely, through the use of wearable sensors and smart medical devices.
Sensors may detect a variety of patient physiological signals, such as temperature, pulse, oxygen saturation, blood pressure and glucose, ECG, and EEG, as well as other body motion related variables that can help accurately monitoring patient movements. Among the potential benefits that can be achieved by body sensors, and more generally by IoT smart devices, in e-Health monitoring are the high rate of data transmission and the minimization of end-to-end data delivery time. The interconnections among the various components of the networks, e.g., IoT devices, intelligent medical devices, edge and fog systems, hospital and cloud servers, patients, and medical staff, are implemented through cabled or wireless networks with low-power communication protocols.
The following case study focuses on body sensor networks, and more specifically on the study of the trade-off that exists between performance of the network (data delivery time) and the energy consumed by the data exchange (the cost of transmission).
The implemented model is derived from a more complex version of the one in [32
] that considers a completely different scenario: the smart monitoring of fog computing infrastructures. The key feature of these models is the dynamic management of the buffer of requests based on the intensity of arrivals and the expiration of a periodic trigger. With the multi-formalism models, it is possible to implement algorithms with dynamic behavior as a function of the workload characteristics.
4.2.1. Description of the Problem
shows the target e-Health scenario considered. The data collected from body sensors are transmitted through wireless or wired connections to the edge nodes located as close as possible, where they are pre-processed and then sent to the fog nodes (if any) or to the hospital servers for their complete processing.
The data arriving at the hospital servers are subject to fluctuations generated by the different type of physiological signals detected and the health conditions of the patients. Indeed, different type of measured variables require a different frequency at which a detected signal is available for transmission. For example, in body temperature, the sampling rate can be once per minute; in pulse oxygen monitoring, the rate can be once per second; and, in other variables, such as ECG or EEG, it can be of the order of several hundred per second. In addition, when a patient’s health condition is assessed as critical, new sensors are activated and the detection rate of other monitored variables can be increased under the control of edge or fog nodes. Among the many problems that need to be addressed, this case study concerns the following:
identification of the amount of data that must be considered in each transmission to hospital servers in order to satisfy the performance requirement in term of end-to-end data delivery time and minimize the energy consumption of the operations; and
identification of potential critical health conditions of patients that need urgent investigation, i.e., fast response time.
The former problem requires studying the trade-off between the time required to deliver the detected signal to the servers in the upper layer of the medical infrastructure, and the cost associated with the transmission operation. The immediate transmission of a detected signal minimizes its end-to-end response time from either the hospital servers or the cloud. However, the set up costs of the connection cannot be shared with other signals. The technique of batching the data of several signals to be transmitted in a single operation is used to approach this problem. The impact on end-to-end delivery time of different batch sizes, and thus on the number of operations required to transmit the signals detected by a set of sensors, must be studied. Knowing the number of sensors connected to an edge system and the type of signals detected, it is possible to derive the arrival rate of the requests to the hospital servers. Then, once pre-processing is complete, the data are stored in a buffer until they are ready to be transmitted. The management of this buffer is crucial to achieve the two objectives described above.
The implemented algorithm considers the number and types of signals detected by the sensors connected to the edge nodes, the fluctuations of arriving traffic considered regular and the arrival patterns that must be transmitted with , as they can be associated with a patient in critical conditions. The most important elements of the implemented model simulating this algorithm are described in the next section.
4.2.2. The Model
Multi-formalism models allow the exploitation of queueing networks and Petri net primitives to represent each concept with the most appropriate technique. To describe the dynamic behavior of the batching algorithm, we used the PN primitives while the QN primitives were used to represent the other components of the e-Health infrastructure. The layout of the model is shown in Figure 20
The workload consists of two classes: detected by sensors (referred to as requests) and a , needed to model the periodic/triggered management of the requests.
The key feature of the model is the algorithm that manages the transmission requests batch. To ensure that the model and the presentation of the algorithm remain as simple as possible, we adopted several assumptions that have a minimal impact on the performance but that greatly simplify the description of other parts of the model.
The set of physiological signals detected by each patient is the same, and each edge system monitors several patients. The computational power of the hospital servers has been oversized compared to the processing time required by signals. In the model, we do not explicitly represent the fog systems since their processing time per request is negligible compared to the service time required by the Edge nodes. Moreover, they were considered as small increases in the service times of the Edge nodes.
The global arrival rate of data generated by sensors is modeled by all the source Sensors as a single aggregated Poisson process of rate . This flow is evenly distributed among the edge systems modeled with Edge queuing nodes. The times required to process the data of a signal, i.e., the service time of a visit by an edge node, are exponentially distributed with mean ms. At the end of this processing phase, the requests are buffered, i.e., routed to the place Buffer, and ready to be transmitted. They are transmitted to the hospital servers (or the cloud servers) that must perform their complete analysis requiring a service time exponentially distributed with mean ms. Requests follow two paths: one for Regular requests and one for Urgent requests. The requests in the buffer are managed according to two different policies:
The buffer is emptied (i.e., the requests that are in the buffer are transmitted) periodically with a period defined according to the number and type of signals detected by all sensors. Requests are assumed to belong to patients under Regular conditions and are sent at the end of the period.
The buffer is emptied when the number of requests in the buffer reaches a threshold value , i.e., the maximum batch size. In this case, requests with such a high arrival rate are assumed to indicate the presence of a critical condition for one or more patients. Therefore, requests in the buffer are considered Urgent and must be sent immediately without waiting for the end of the emptying period.
The periodic transmission of Regular requests is modeled by the loop between places and transitions Waiting, Periodic, Transmitting, and Reset. The deterministic firing time of transition Periodic represents the duration of the clock for the transmissions of the requests arrived in Buffer. This value is computed by analyzing the detection rate of the sensors in normal operating conditions. According to the configuration analyzed, we considered 15 s as constant firing time of the transition Periodic (i.e., as periodic empty cycle time). As soon as the empty cycle expires, a token is transferred to place Transmitting where two alternatives are possible. If there are requests in the buffer, the immediate transition Regular will be enabled and will transfer them to the transmission channel. When the buffer is empty, either because all requests have been transferred or because no requests arrived in the periodic time frame, the immediate transition Reset fires, due to an Inhibitor arc that connects it to the Buffer, and restarts the timer.
The Urgent requests are managed by the immediate transition Urgent that is connected to place Buffer with an input arc of weight . When the threshold value is reached, the batch of requests in the Buffer are immediately transmitted to the hospital server. Note that also the arc that exits the transition Urgent has weight , since the entire batch of requests is sent to the server.
4.2.3. Model Results
We considered several configurations of the system by modifying the global arrival rate , the threshold of Urgent requests, and the empty buffer cycle time. In this section, we limit the description of the results to those that emphasize the impact of the dynamic management of the buffer of requests on the performance of the system and related costs.
The arrival rates of the requests considered in the study are . The inter-arrival times are exponentially distributed. These values were assumed to be representative of the potential number of patients in an emergency ward of small- or medium-sized hospitals.
shows the behavior of the System Response Time
R, i.e., the end-to-end time required by a signal from its detection to the completion of its processing by the hospital server, for the complete range of arrival rates. The family of curves refers to different values of
, from 2 to 20, i.e., the threshold for the identification of Urgent
batches of requests will be transmitted immediately.
With a increase, we model an increase in the number of sensors or in the detection rate and the workload managed by Edge nodes is also greater. For small values of (2 and 5), there is an initial decrease of R. This is because, with such small sizes, nearly all batches are considered Urgent and therefore most requests are transmitted almost immediately when they join the Buffer. With larger values of (10, 15, and 20) this initial decrease of R as increases is not present. When becomes greater than 0.1 the value of R starts to increase from the beginning. With these batches sizes, R begins to decrease for larger values than those obtained with smaller . The motivation for this behavior is that, as increases, the number of requests in the buffer increases as well, and if has higher values the threshold is reached with less probability and the number of batches transmitted when the period expire is greater. However, with further increases of , the threshold is reached more easily and therefore there is a shorter waiting time for the requests in the buffer, i.e., the Urgent requests increase. When is greater than 1.5 req/s, R increases for all since the response time of the highly utilized hospital server becomes the dominant part of its value.
According to the objectives of the study, the R values should be analyzed together with the cost (energy consumption) for transmissions. It is assumed that the cost is directly proportional to the number of times a batch is transmitted, i.e., a buffer is emptied. Indeed, the greater are the number of times and the sizes of the transmitted requests at the same time, the better is the energy efficiency of the system. In Figure 22
, the transmissions per second are shown for all the values of the considered
As expected, with batch size
, the number of transmissions per second are the highest, while the minimum for
. However, to have a meaningful result, these values should be considered together with the system response times. To this end, the metric system power, introduced in [33
] and combining the throughput X of a system with its response time R, is considered. This metric is the ratio X/R of throughput and response time, and captures the level of efficiency in executing a workload. The maximum power corresponds to the optimal operating point for the system, i.e., the point at which the throughput is maximized with the minimum response time. In our system, we have considered the ratio of transmissions/s and aystem response time. Figure 23
shows the ratio of the two metrics of Figure 22
and Figure 21
The optimal operating points of the system are clearly identified as a function of the batch size for the Urgent requests and the global arrival rate of signals.