Ofﬂoading Data through Unmanned Aerial Vehicles: A Dependability Evaluation

: Applications in the Internet of Things (IoT) context continuously generate large amounts of data. The data must be processed and monitored to allow rapid decision making. However, the wireless connection that links such devices to remote servers can lead to data loss. Thus, new forms of a connection must be explored to ensure the system’s availability and reliability as a whole. proposes stochastic Petri and reliability block diagrams to evaluate a system’s reliability. methodologies, stochastic Petri nets provide models that represent complex systems with different characteristics. UAVs are used to route data from IoT devices to the edge or the cloud through a base station. The base station receives data from UAVs and retransmits them to the cloud. The data are processed in the cloud, and the responses are returned to the IoT devices. A sensitivity analysis through Design of Experiments key points of improvement for the base model, was numerical analysis the components with the most signiﬁcant impact on availability. For example, the cloud proved to be a very relevant component for the availability of the architecture. The ﬁnal results could prove the effectiveness of improving the base model. The present work can help system architects develop distributed architectures with more optimized UAVs and low evaluation costs.


Introduction
The number of applications for mobile devices has been increasing rapidly. These applications include video streaming, augmented reality applications, audio/video conferences, collaborative editing, etc. [1]. These applications usually require a lot of computational resources, but run on devices with limited resources. Alternatively, instead of performing such a task on a mobile device, a request can be made for the execution to take place on a remote computer; this is called offloading [2]. According to the Statista (a German online portal for statistics), the number of mobile devices will be around 17.72 bil-

•
Two SPN models (base and extended), which represent and evaluate an architecture based on IoT sensors, UAVs and remote resources, aided by the cloud. The models are highly configurable, as it is possible to calibrate ten timed transitions in the extended model. The models enable system designers to evaluate the studied system according to availability and reliability aspects. • A sensitivity analysis, which identify the critical points of the architecture, as well as ways to improve it. • Case studies that provide a practical guide for analyzing the dependability of the proposed architecture. The first case study investigates distinct scenarios for evaluating availability, and the second one focuses on reliability.
The remainder of the paper is divided as follows. Section 2 shows related works. Section 3 provides a detailed explanation of Petri nets and other issues. Section 4 presents the evaluated architecture. Section 5 reveals the availability and reliability models for the proposed architecture. Section 6 presents the results of a case study.
Finally, Section 7 outlines some conclusions and future directions of this work.

Related Works
In this section, some works that are related to this paper are presented. Table 1 shows the related works under six aspects: context, metrics, model type, offloading application, use of cloud/edge, and sensitivity analysis. The first comparison criterion is context.For the context, there is a wide range of subjects. Only our work explores UAVs for offloading. Refs. [13,21] presented the most similar proposals and focused on ensuring that UAVs work with a certain level of security, being more focused on hardware and software, respectively. Gonçalves et al. [13] performed an analysis that assesses the safety of UAVs in order to facilitate the airworthiness certification process of UAVs. In this way, the work helps manufacturers to more easily identify critical points in a UAV during development. Zhou et al. [21] generated a model capable of identifying components that do not comply with the established safety requirements, in addition to an algorithm that is also capable of detecting inconsistent components that do not comply with the safety standard. The study of Sharma et al. [18] can be considered the closest to our context, with similarities between measuring reliability and seeking to detect system flaws. However, the work has a greater focus on evaluating the embedded system that controls devices, such as a UAV, than the UAVs themselves.
The second criterion is metrics. The metrics here are quite different from each other, but others have a very different goal. This is the case of the metrics used by [19,23,24]. These works have a general objective related to performance. For example, Sharma et al. [19] brought forth several metrics that seek to measure the efficiency of a UAV, instead of the more usual metrics of response time and utilization. Cheng et al. [23], on the other hand, already conducted an analysis aimed at measuring computational delay in a data offloading scenario, using UAVs and satellites. Finally, Faraci et al. [24] made a very thorough performance analysis of a scenario similar to the one in this article. The work uses both key performance metrics and efficiency metrics, which makes it different from this article, which measures system availability and reliability. While the other works present metrics that can be associated with the system's dependability concerning the availability metric, our work stands alone in this sense.
The third criterion is the type of model chosen to evaluate the system. Most papers chose Petri nets [25] and/or with hierarchical models [26] to evaluate their systems. The variation of types observed for the used Petri nets is considerable, ranging from colored Petri nets to Hierarchical Context-Aware Aspect-Oriented Petri nets. However, our work is the only one to use stochastic Petri nets. Some papers also chose to use the Markov decision process to model the system. Chen et al. [22], for example, used the Markov decision process to meet the needs of the practical application scenario and provide a flexible and effective discharge mode.
The fourth and fifth criteria are related, which are the offloading application and use of cloud/edge; as their values are equal, it is worth joining the analysis. The works [22][23][24] were the only ones that approached the proposal presented here. Chen et al. [22] presented a strategy of data retransmission and edge computing in which UAVs simultaneously perform data processing and offloading. Ref. [23] proposed a flexible method of joint computing that provides full cloud/edge computing services to remote IoT users. Finally, Ref. [24] aimed to extend the capabilities of a 5G network to ensure ultra-low latency in processing data streams generated by connected devices. The last criterion is the sensitivity analysis. Our work is the only one that used a sensitivity analysis to check the importance of the components.

Background
This section presents the main essential concepts to understand the proposals.

Reliability Block Diagram
The reliability block diagram (RBD) is a graphical analysis technique that expresses the system as connected components, according to its logical relationship of reliability [27]. In an RBD, the system components and their relations are represented by connections and blocks. The blocks represent the smallest entities in the system, which cannot be further divided: system components or groups of components [28].
In summary, we have a block model, where the connections between them represent a specific system's behavior. The connections can express two types of behaviors for the components: serial and parallel connections. Components connected in series means that they all must be working for the system to work. When connected in parallel, at least one of the components must be working for the system to be available [29][30][31]. Figure 1 shows an example of an RBD model composed of Components 1 to 5. At the beginning of the model, we have Begin; it represents the beginning of the system, usually followed by the lowest parts of the evaluated system. At the end of the model, we have End; it represents the end of the system, which is usually closer to the highest parts of the evaluated system, which can be several things, such as an operating system, software, virtual machine, etc. In the model, we have, in principle, series connections and a parallel connection. In this RBD example, the system is operational if components 1, 4, and 5 are working, and components 2 or 3 are active. An RBD model's reliability analysis depends on the mean time to recover (MTTR) and mean time to failure (MTTR) of each component in the system.

Stochastic Petri Net
SPNs [32][33][34][35][36][37][38] can be identified as a type of directed graph divided into two parts, filled by three types of objects. These objects are places, transitions, and directed arcs that connect places for transitions and transitions to places. Figure 2a illustrates two types of transitions (timed and immediate). The timed transition follows a stochastic behavior, following a probability distribution function. The immediate transition fires when it is activated, without waiting for any period. The white circle symbolizes places. Arcs are used to connect places to transitions. Inhibitory arcs block or allow the passage of tokens from one place to another. Moreover, the token is assigned to a specific place.  In SPN models, to assess availability, the concept of active or inactive components is active. Figure 2b presents a small availability model with two components (A and B). Both have mean failure (MTTF) and repair (MTTR) times. Component A, for example, is active when it has a token in place A_U and is inactive when it has a token in place A_D. In this example, for component B to be active, A must also be active. The inhibiting arc ensures that if component A changes from the upstate to the downstate, the T0 transition is triggered, and component B is then also in the downstate.

Sensitivity Analysis with DoE
The sensitivity analysis is a measure of the effect of the given input data about the output data, aiming to outline the weak links of the computer systems and, from then on, seek to adopt a set of techniques that aim to improve these systems in different scenarios [39]. In a way, the sensitivity analysis can provide the necessary security and forward the results within the system administrators' perspective. In this work, we apply a sensitivity analysis with DoE.
The Design of Experiments (DoE) corresponds to a collection of statistical techniques that deepen the knowledge about the product or process under study [40]. It can also be defined by a series of tests in which the researcher changes the set of variables or input factors to observe and identify the reasons for changes in the output response.
The parameters to be changed are defined using an experiment plan. The goal is to generate the most significant amount of information with the fewest possible experiments.
The behavior of the system based on parameter changes can be observed using sets of outputs. In the literature, there are three categories of graphs usually adopted for experiments with DoE: • The Pareto chart is represented by bars in descending order. The higher the bar, the greater the impact. Each bar represents the influence of each factor on the dependent variable. • Main effects graphs are used to examine the differences between the level means for one or more factors, graphing the mean response for each factor level connected by a line. It can be applied when using a comparison between the relative strength of the effects of various factors. The signal and magnitude of the main effect can express the mean response value. The magnitude expresses the strength of the effect. The higher the slope of the line, the greater the magnitude of the main effect. It is necessary to consider that the horizontal line means that it has no main effect, that is, each level affects the response in the same way. • Interaction graphs are responsible for identifying interactions between factors. An interaction occurs when the influence of a given factor on the result is altered (amplified or reduced) by the difference in another factor's level. Assuming that the lines on the graph are parallel, there is no interaction between the factors. If they are not parallel, there is an interaction between the factors. Figure 3 shows the base architecture modeled in this work. The architecture consists of a distributed system composed of IoT devices that generate requests to remote servers. There is a chain of UAVs that communicate with a base station that receives such data. The collected data are processed by a cloud or edge server, depending on the demand. Thus, communication takes place in three stages:

1.
IoT devices are the customers of the application requesting services and sending data. UAVs are controlled via a wireless network (5G, for example). UAVs fly over the communication area, receiving data from mobile devices.

2.
The base station receives data from UAVs and retransmits them to the cloud. 3.
The data are processed in the cloud, and the responses are sent back to the IoT devices.
A possible limitation is that the service may become unavailable depending on the data demand. Just one UAV failing can significantly decrease the availability of the system, regarding a critical application. Another possible limitation of the base architecture is that if the cloud goes down, the entire system will stop working. Figure 4 presents a second proposal of the redundant architecture where a server at the edge of the network is added. The goal is to improve availability because, depending on the type of request, processing can be performed at the edge. Both servers are always connected. However, as there are two servers, the load is divided. If the cloud goes down, the edge server takes over the processing and vice versa. The battery and recharge time of the UAV is not taken into account. In Figure 4, the cloud server (3.a) and the edge server (3.b) are a server mounted on a single physical machine.

SPN Models
This section presents the proposed models generated for representing the architecture behavior shown in the previous section.

Base Model
This section presents the SPN base model. The components in the model correspond to the same components shown in the previous section. Figure 5 shows the model with the following functions: (i) UAVs are responsible for collecting data for offloading; (ii) the network is responsible for receiving requests from UAVs and forwarding them to the cloud; and (iii) the cloud is responsible for processing the data. Each component has its respective MTTF and MTTR. The network is modeled, taking into account the dependency between the components. When a component fails, the immediate transition (T0) makes the next component, which depends directly or indirectly on it, also fail.

Network
Cloud Drone The ND mark in place UAV_U corresponds to the number of available UAVs. The UAVs are working when the ND tokens are in the UAV_U place. The evaluator can define (with this ND tag) how many UAVs must be active for the system to be working. A UAV component is not working when it has a token in the UAV_D place. The change between the active and inactive states is caused by the transitions UAV_MTTF and UAV_MTTR.
The network is up and running when it has tokens in the NETWORK_U place. The network component is not working when it has a token in one of the following places: BS_D or NETWORK_D. The base station is working if it has a token in the place BS_U; it is inactive when it has a token in the place BS_D. The change between the active and inactive state is caused by the following transitions: BS_MTTF and NETWORK_MTTF for the mean time to failure; and BS_MTTR and NETWORK_MTTR for the mean time to repair.
The cloud is working when it has tokens in the CLOUD_U place. We consider that the cloud component is not working when it has a token in the CLOUD_D place. The change between the active and inactive status is caused by the transitions CLOUD_MTTF and CLOUD_MTTR for the mean time failure and repair, respectively. Especially for the cloud, an RBD model is designed to obtain the failure and recovery times. The RBD model and its data for simulation are based on the model presented by [41]. Figure 6 shows the RBD model used to obtain data from the cloud component. We consider a private cloud type. The components adopted are the same platform components as Eucalyptus or OpenStack. The main components that form a cloud are the frontend and nodes. In order to obtain the failure and recovery values of these components, they are subdivided. The node is formed by the following components: hardware (HW), operating system (OS), hypervisor, instance manager (IM), and virtual machine (VM). The frontend is formed by the following components: hardware (HW), operating system (OS), platform manager (PM), cluster manager (CM), block-based storage (BBS), and file-based storage (FBS). is required to be working. It is worth mentioning that in order to have a more realistic simulation, the guard condition #BS_U > 0 is used in the transition NETWORK_MTTR so that the network component can only be recovered when the base station is working.

Sensitivity Analysis with DoE
The first step in conducting a sensitivity analysis with DoE is to define which factors and levels are to be considered. In this work, the base model transitions are adopted as factors, varying each factor in two levels. Table 2 shows the factors and respective levels. Table 3 shows the combinations of DoE scenarios generated from the design table.   Figure 7 shows the Pareto graph with the level of significance for each of the factors in the model. The term H is the most significant for the availability variable. This fact corroborates the hypothesis that the cloud is the main element for the base architecture's availability. It is observed that the term G is raised to almost the same level as the term H. In summary, both the recovery and the cloud's failure are essential factors for maintaining the system.  Figure 8 shows the level of interaction between the factors CLOUD_MTTR and CLOUD_MTTF. Such factors were the most significant in the study of the impacts on the Pareto chart. The non-parallel lines indicate that there is an interaction between the factors. The result of changing one factor influences the result of changing the second factor. With a lower CLOUD_MTTF, the variation in CLOUD_MTTR has a more significant influence on availability.  Figure 9 shows a Contour graph of the interaction between MTTF and MTTR from the cloud. The Contour graph shows the same information as the interaction graph, but differently, through heat zones. The higher the MTTF and the lower the MTTR, the greater the overall availability of the system. Assuming that the recovery time is over 1.2, the system never reaches more than 99.2% availability.  Figure 10 presents an extended SPN model for the UAV architecture, with two processing possibilities (edge and cloud), a network, and a UAV component. To satisfy these conditions, two new transitions are added, EDGE_MTTF and EDGE_MTTR, which represent the failure and recovery time of edge. The places EDGE_U and EDGE_D represent the active and inactive states of edge, respectively.

Cloud
Edge Drone Network Figure 10. Extended SPN model (when an edge server is added in the system to improve the dependability metrics).
The system's condition to be fully active is the same as described in the base proposal, with the addition of the edge component. However, as edge represents redundancy for the cloud, it is enough that one of them is active for the system to be available. The availability of the model can be measured in two ways. Equation A = P{(# UAV_U = ND)AND((#CLOUD_U > 0) OR(#EDGE_U > 0))AND(#NETWORK_U > 0)} calculates the system availability of the extended model when 100% of the UAVs are required for the system to be working. Equation A = P{(# UAV_U >= MND)AND((#CLOUD_U > 0)OR(#EDGE_U > 0))AND(#NETWORK_U > 0)} calculates the system availability of the extended model when a minimum number of UAVs (MND) are required for the system to be working.
It is worth mentioning that the MTTF transitions are of the infinite server type, and the MTTR transitions are of the single server type, both in the base and in the extended model. In this model, the guard condition #BS_U > 0 is also adopted in the transition NETWORK_MTTR so that the network component can only be recovered when the base station is working.

Reliability Model
The reliability of a system is a factor that is becoming increasingly decisive for a product to be well accepted by its consumers. Reliability is the conditional probability that a system remains operational given a time interval [0, t], considering that it is operational at t = 0. Three aspects must be taken into account when the reliability of a system or component is analyzed. First, an unambiguous definition of possible system failures must be made. Second, the time unit must be identified, which can be given in hours, days, or cycles of system operation. For example, an operating cycle could be the activation of the engine ignition for an automotive system. The system must be observed under natural operating conditions, subject to real physical conditions. Observing the system under manipulated conditions can generate biased reliability data [42]. Figure 11 shows the reliability model adapted from the base availability model by removing the transitions from MTTR.

Network
Cloud Drone Figure 11. Reliability model based on the base model without recovery.
The results of this metric are presented over time in the next section from the transient analysis.

Case Study
This section presents a case study with the models previously presented. Initially, the parameters used in the simulations are presented. The MTTF and MTTR values for each component are extracted from [43][44][45]. Table 4 shows the input values of the model components that are adopted for conducting the numerical analysis. Table 5 presents the parameters, extracted from [41], adopted for the RBD analysis.

Results for Availability
This section presents the availability analysis for the two models already presented. Six scenarios are defined, varying the number of available UAVs (8,16,24,32,40,and 48 UAVs). These scenarios are defined based on the sensitivity analysis of the models. The UAV is a component that appears in the base model as the second most important and as the most important in the extended model. The metric used in the analysis is that 100% of the UAVs need to be active. These scenarios are generated to compare the two models' availability and the impact that the number of UAVs generates in each of the architectures.
The results calculated by the stationary analysis with the Mercury [46] tool is presented below. Figure 12 shows the availability calculated in the analysis. The availability variation shows that availability drops as UAVs are added to the system. The extended model's availability tends to fall at better levels than in the base model. We believe that this fact occurs due to the redundancy implemented in the extended model. Even if we add UAVs, there will be a redundant component to supply the cloud's eventual failures.   These results are generally expected since the more components we have in a system, the greater the chances of one of them failing; the more they fail, the more time it takes to fix. As our metric considers the system's functioning when all UAVs are working, a higher failure rate negatively impacts availability. This behavior is repeated even in the extended model. However, in the extended model, this occurs in a much less impactful way because there is a redundancy mechanism in the server. Figure 14 shows the reliability of the system over time. For the reliability analysis, three levels of parameters are proposed: CLOUD_MTTF of the base model, CLOUD_MTTF increased by 50%, and CLOUD_MTTF decreased by −50%. The X-axis represents the time sampling in hours. The Y-axis represents the probability of system failure. The reliability levels are very different from each other. Analyzing the lines at all levels, they follow a decreasing probabilities pattern as the operating time progresses. At the beginning of each test run, each starts with a high probability. Over time they tend to decrease exponentially. As the failure time decreases, there is a tendency for a steeper curve of the line. Such behavior indicates that, with less time between failures, the system tends to fail but decrease in reliability. With the increase between failures, the system tends to fail less frequently during the runtime. Although it seems favorable, this behavior further proves that the base model is not favorable for the recovery and maintenance of the elements. The second conclusion is that cloud failures are harmful, connoting that the cloud is the model's key element.

Discussions
The input parameters are adopted mostly from previous works on the studies of generic-purpose UAVs. The computed values are very similar to those from industry (DJI, Parrot, etc.), as we conducted surveys from the internet for general-purpose UAVs. Thus, we believe that the considered UAVs are of general types in industry. Furthermore, one may adopt the proposed models in this work and feed the configurations and parameters of their own UAVs to investigate the dependability output metrics of interest.
It is very interesting to consider the operational conditions of a UAV fleet (e.g., horizontal extent, altitude, and weather). However, the elaboration of the operational conditions is beyond our main focus and out of the research scope in which we investigate the impact of the number of UAVs and operational capacity of components/subsystems on the overall data offloading service dependability (e.g., reliability, availability). We believe the elaboration of different operational conditions is interesting and could provide fruitful research avenue in future works.

Conclusions
This paper proposed two SPN models to represent and evaluate a cloud data offloading system's dependability aided by UAVs. In this work, UAVs were adopted as connection bridges to remote processing resources. The models were highly configurable. The extended model, for example, permitted inserting ten parameters regarding timed transitions. Some scenarios were evaluated by varying the number of UAVs in the system. The availability variation showed that availability dropped as UAVs were added to the system. The extended model's availability tended to fall at better levels than in the base model. We believe that this fact occurred due to the redundancy implemented in the extended model. By adding redundancy to the model, it was possible to see a considerable increase in availability (about three times) and a large downtime decrease (about seven times). The model analysis was able to identify the most relevant factor in the architecture. The cloud's MTTR was the factor with greater impact on availability. The analysis also showed the availability behaviour for different recovery and failure times. The reliability model demonstrated the importance of investing in cloud equipment to increase the mean time to failure. As future work, we intend to further extend the model by adding other components, such as distinct internet connections and varied hardware configurations. Other metrics can also be explored, such as security, the drop rate, and the mean response time.

Conflicts of Interest:
The authors declare no conflict of interest.