Evaluation of Epidemic-Based Information Dissemination in a Wireless Network Testbed

: Information dissemination is an integral part of modern networking environments, such as Wireless Sensor Networks (WSNs). Probabilistic ﬂooding, a common epidemic-based approach, is used as an efﬁcient alternative to traditional blind ﬂooding as it minimizes redundant transmissions and energy consumption. It shares some similarities with the Susceptible-Infected-Recovered (SIR) epidemic model, in the sense that the dissemination process and the epidemic thresholds, which achieve maximum coverage with the minimum required transmissions, have been found to be common in certain cases. In this paper, some of these similarities between probabilistic ﬂooding and the SIR epidemic model are identiﬁed, particularly with respect to the epidemic thresholds. Both of these epidemic algorithms are experimentally evaluated on a university campus testbed, where a low-cost WSN, consisting of 25 nodes, is deployed. Both algorithm implementations are shown to be efﬁcient at covering a large portion of the network’s nodes, with probabilistic ﬂooding behaving largely in accordance with the considered epidemic thresholds. On the other hand, the implementation of the SIR epidemic model behaves quite unexpectedly, as the epidemic thresholds underestimate sufﬁcient network coverage, a fact that can be attributed to implementation limitations.


Introduction
Modern technological environments are overloaded with large volumes of data, requiring simultaneously high speeds for the transfer and exchange of information, while maximizing energy savings [1][2][3]. In this context comes the spreading of the Internet of Things (IoT) [4], which is continuously invading everyday life and harmonizing with human needs and requirements. Due to the technological advances in micro-electro-mechanical systems and wireless communications, including the IoT, the development of Wireless Sensor Networks (WSNs), which monitor certain environmental parameters and intelligently organize the collected data in central locations, has become a reality [5][6][7]. This class of networks usually involves a large number of such small low-powered wireless devices that require simple information dissemination algorithms in order to achieve global outreach. With regard to their spontaneous formation and wireless nature, WSNs bare many similarities to the wireless ad hoc networks.
Information dissemination aims to spread information to multiple destinations, typically through the use of broadcast techniques. It is an integral part of basic network operations, including routing, network discovery, and finding the optimum-path, and it has been studied extensively to provide models' probabilistic nature. The acquired results indicate that the models perform similarly, with the performance of probabilistic flooding being in accordance with previous analytical models, while the SIR epidemic model relatively differs. The considered thresholds for probabilistic flooding successfully predict the network's coverage, whereas the thresholds for the SIR epidemic model seem to overestimate the network's coverage for a given set of parameters. This fact can mainly be attributed to the selected SIR implementation, whereby a transition to the recovered state may take place, prior to the transmission from an infected node to another. Therefore, despite commonalities between the thresholds for the two models, the resulting coverage differs for the considered implementations.
The remainder of this paper is structured as follows: In Section 2, an overview of related work is provided focusing on the discussion of the algorithms' epidemic thresholds. Section 3 describes the relevant models and epidemic thresholds used in the paper, and in Section 4, the system used for the experimentation in the real wireless network environment is described, along with the implementation of the two information dissemination approaches. Section 5 thoroughly describes the conducted experiments and presents the results, while in Section 6, conclusions are drawn and future work is outlined.

Past Related Work
A large number of studies has focused on the efficiency of flooding. The broadcast storm problem pointed out by Ni et al. [10] is known to hold flooding responsible for causing a significant amount of network traffic. They proposed various alternative approaches to eliminate the problem. William et al. conducted a study [19], comparing different broadcast algorithms in ad hoc networks, one of which was probabilistic flooding. They showed that flooding was inadequate for congested networks as the increased number of nodes in static networks leads to performance loss due to the large number of failed message re-transmissions.
Efforts have been made to optimize probabilistic flooding's efficiency. Fixed probabilistic flooding has been mainly studied in random networks, using percolation theory and phase transition phenomenon [20][21][22]. Crisóstomo et al. [23] derived values for the threshold probability, considering various stochastic topologies. Additionally, they randomly sampled the graph's total node set, forming an induced subgraph, based on the forwarding probability of probabilistic flooding. They proved that the event of this subgraph being a connected dominating set is probabilistically equal to the event of global outreach by probabilistic flooding. In [24], Oikonomou et al. derived asymptotic expressions for probabilistic flooding with underlying networks that are random graphs. Later, in [14], Koufoudakis et al. studied probabilistic flooding using algebraic graph theory elements, and an algorithm was introduced to estimate the threshold probability, whereas in [25], a lower bound of the forwarding probability was proposed, allowing for global network outreach. It was shown that for large values of time, a forwarding probability sufficient to provide global outreach is inversely proportional to the average number of neighbors in the network.
However, results observed for random graphs are not always applicable to ad hoc networks. Consequently, it is a hard task to determine an optimum forwarding probability, as it may depend on various network parameters, such as density, distance among nodes, and speed. An alternative approach to fixed probabilistic flooding is to use adaptive probabilistic schemes [26], where the forwarding probability is determined according to the aforementioned parameters.
Probabilistic flooding shares many of its features with epidemic-based dissemination approaches with respect to the network's coverage [17]. The rise of theoretical epidemiology began in the early 20th Century with the mathematical modeling of infectious diseases and has been mainly a subject of biologists and health physicians such as , Anderson-May [27], and Ross [28]. Their main purpose was to predict the spreading properties of an infectious disease, and they conventionally have relied on the use of compartmental models. As early as of the 1980s, the mathematics underlying the propagation of epidemic approaches has been shown to be related to communication protocols [29]. In the years following, epidemic models and their applications have been widely investigated in communication systems [12,30]. More recent approaches include epidemic information dissemination for mobile molecular communication systems [31] and WSNs [32], the optimal control study of information epidemics in social networks for campaigning strategies [33] and even for evaluating the dissemination of information in urban scenarios with different crowd densities and renewal rates [34].
Cohen's pioneering work [35] on the subject of computer viruses suggested that the spread of an infection follows the transitive closure of information flow in a system. He was also the first to point out that there is no algorithm that can perfectly detect all possible viruses. Later, he and Murray [36] indicated a link between the spread of computer viruses and biological epidemiology; however, they did not develop an explicit model linking the two together. The first serious attempt at adapting mathematical epidemiology to the subject of computer virus replication within computing systems was presented by Kephart and White [37]. They showed that the epidemic threshold is related to the average node degree of the network (average number of neighbors in a network) and stated that the absolute prevention of a computer virus is indeed impossible; yet, controlling the rate at which a a detected virus is removed from the system may be sufficient for the prevention of an epidemic.
Epidemic thresholds have been a subject of research for various network topologies. Pastor-Satorras and Vespignani [38][39][40][41] focused on the epidemic spread in scale-free networks where they proposed an analytical model for power-law graphs and respectively derived an epidemic threshold. Scale-free networks exhibit a highly skewed connectivity distribution making them virally more susceptible, as computer viruses are more likely to propagate among the nodes of such high connectivity variance networks, thus making the infection harder to extinguish. Eguiluz and Klemm [42] proposed an even tighter epidemic threshold for real graphs and provided confirming results on several Internet graphs. Serrano and Boguñá [43] derived the epidemic threshold for clustered networks.
Later, Boguñá et al. [44] studied epidemic spreading in correlated complex networks, where the connectivity of a particular individual is dependent on its neighbors' connectivity. Moreno et al. [45] also studied the epidemic modeling in complex networks of Watts-Strogatz and Barabasi-Albert and related the epidemic threshold to the moments of the connectivity distribution. They also confirmed the absence of any finite threshold for particular types of connectivity distributions. Wang et al. [16] designed a general epidemic spreading model suitable for any network and proposed an epidemic threshold condition, which applies to any arbitrary graph. This threshold is inversely related to the largest eigenvalue of the network's adjacency matrix. The number of infected nodes in a network exponentially decay for values below this epidemic threshold.
In [14,17], Oikonoumou et al. and Koufoudakis et al. studied probabilistic flooding under the light of epidemic-based dissemination, revealing the close relation of the two approaches regarding the epidemic threshold. The network's coverage was transformed into a coverage polynomial of which the largest root was a lower bound for the threshold probability, and it was tighter than the epidemic thresholds proposed by Kephart-White and Wang et al. [16,37].

Models' Definition
The exploration of the models takes place in a network topology, which is modeled by an undirected graph G(N, L), with N and L being the sets of nodes and links (or edges), respectively. Furthermore, A is an |N| × |N| adjacency matrix of G(N, L), with element A i,j being one, if nodes i and j are adjacent, and zero otherwise. The largest eigenvalue of matrix A is denoted by λ 1 . It is also assumed that the implemented algorithms' termination takes place for t = T, where t is a given point in time and T is referred to as termination time.

Probabilistic Flooding
Information dissemination in probabilistic flooding begins to spread from a particular node, namely the source node of the network. The message re-transmission from a node that receives the information message for the first time takes place, based on a predefined forwarding probability q. Any messages that are received by the nodes after the first time are ignored. The fraction of the nodes covered by the propagated information under probabilistic flooding is referred to as the network's coverage and is denoted by C PF (t) for the time t.
The network's coverage in probabilistic flooding depends on the value of q. The smaller the forwarding probability, the more difficult it becomes for a message to travel across the network, thus the greater the uncertainty for the network's global outreach. The optimum forwarding probability that allows high network coverage and at the same time minimizes the message transmission is referred to as the threshold probabilityq.

SIR Epidemic Model
In a typical SIR epidemic model, which includes the aspect of recovering from the infected state, the spread of an infection from an infected individual takes place based on a fixed infection rate β, and the process of a node recovering from an infection takes place at an infection recovery rate γ. The dynamics of an infection transmission are defined by the basic reproduction number τ = β/γ. Thus, τ implies the average number of cases generated by a primary infection case in a susceptible population, throughout its duration [46].
Based on the SIR epidemic model, a susceptible individual can, upon meeting an infectious individual, become infectious, and after some time, the infectious individual recovers. It is assumed that the total number of the population is given by: where S t , I t , R t are the sets of susceptible, infected, and recovered nodes for time t, respectively. In a network of |N| nodes, each node represents an individual subject, and the edges L specify the interactions between the different individuals. The spreading of the infection starts from a source node, and the rate at which an infected node i can infect a susceptible neighbor j is defined as the infection rate β. The recovery process that transfers a node from the infected state to its following state (recovered) is an internal Markovian (memory-less) process, which takes place at a constant rate γ. An individual becomes fully immune to a particular disease after recovering from its infected state and therefore, if reinfected, cannot re-transmit the disease to its neighbors.
An important assumption that has to be made regarding the spreading of the disease to a susceptible individual is that the infection occurs with a fixed probability β, for each link between a susceptible and an infected individual. Similarly, transitioning from the infected to the recovered state occurs based on a constant probability γ per unit time. These assumptions are introduced to simplify the mathematical modeling, especially when it comes to differential equations, while also simplifying the implementation, as discussed in Section 4.2.
The critical value of τ that determines whether an infection will die out or will be able to spread through a population is referred to as the epidemic threshold and denoted byτ. This means that for values of τ ≥τ, the infection rate is high enough, such that the recovery rate is insufficient and the disease spreads out through an entire susceptible population, whereas for values of τ below this threshold, either the generated outbreak is insignificant or the disease does not occur at all. The value of this threshold helps to develop measures for an epidemic's prevention and control [47]. The fraction of the nodes covered by the SIR infection message is referred to as the network's infection coverage and is denoted by C SIR (t) for the time t, that is: This formulation is selected due to the fact that information dissemination is the topic at hand in the current work.

Epidemic Thresholds
Both probabilistic flooding and the SIR epidemic model function in a relatively similar manner. Probabilistic flooding comes up if each node were to operate according to the SIR epidemic model and become recovered after infecting its neighbors exactly once. Much work has focused on identifying the epidemic thresholds for both of these models, some of which share a few similarities. In this work, due to the fact that the focus is on information dissemination, an infection is considered to be an epidemic if it has affected most of the network. The network's average degreed and the highest eigenvalue λ 1 of its adjacency matrix are most prominent in epidemic thresholds.
The epidemic thresholds are values for which sufficient coverage is achieved. For probabilistic flooding, an epidemic thresholdq is such that C PF (T) is expected to be close to one for q ≥q. Similarly, for the SIR epidemic model,τ is an epidemic threshold if C SIR (T) is, again, expected to be close to one, when τ = β/γ ≥τ.
When it comes to probabilistic flooding, the relevant thresholds that are being focused on in the current study includeq 1 = 1/d andq 2 = 1/λ 1 . It was also argued in [25] thatq 3 = 4/d andq 4 = 4/λ 1 are tighter bounds for sufficient coverage. In general, the thresholds that involve eigenvalues are regarded as more precise, as λ 1 ≥d, but they can be harder to estimate in real scenarios, due to requiring knowledge of the entire topology.
Regarding the SIR epidemic model, identifying tight epidemic thresholds is much more challenging, with multiple ones being proposed in the literature. Interestingly, few of these thresholds are identical to the ones detailed for probabilistic flooding. In particular,τ 1 = 1/d andτ 2 = 1/λ 1 once again come up for some networks, although they are based on somewhat "naive" expectations [15].
Therefore, probabilistic flooding and the SIR epidemic model exhibit various similarities, both in terms of functionality and epidemic thresholds. In order to investigate how similar the capabilities of both mechanisms are at disseminating information in a real networking environment, this paper focuses on their experimental evaluation.

Implementation Description
In this section, the implementation of the system that is used for the algorithms' evaluation is discussed with respect to its hardware and software components. In Section 4.1, the hardware constituting the nodes in the utilized WSN is described, and in Section 4.2, the implementation description of the probabilistic flooding and the SIR epidemic algorithms takes place.

Hardware
The implementation and the experimental approach of the proposed algorithms took place on the CAmpus TestBed of the Ionian University (CaBIUs) [18], where the network nodes comprised open-source Arduino Mega microcontroller boards [48] that were equipped with an XBee S2C ZigBee module [49]. The successful implementation of a fully-fledged node was achieved with the help of a wireless SD shield [50]. The shield allowed the placement of the module onto the Arduino microcontroller board and created an interface between the two components. Such a node is shown in Figure 1. Arduino Mega boards were chosen for this specific application, due to the large volume of the measurements that needed to be collected, as they have high storage capabilities.
Regarding the data transmission medium, the XBee S2C ZigBee module was selected, as its high functionality specifications were considered vital for the efficiency of the network. It can support device types such as ZigBee coordinator, which starts the network and is responsible for its overall management, and ZigBee routers for which a network can contain one or more depending on the size and topology [51]. The XBee devices are also capable of operating in two modes, transparent and Application Programming Interface (API) [52]. For the purposes of this paper, API mode was considered, which transmits data in well defined frames, carrying additional information related to either the configuration or communication required for transmission between multiple devices, and also supports a programming interface for application implementation.

Software
In the sequel, the software developed for this study is described, consisting of the algorithms' implementation, which was uploaded to each of the network's nodes (microcontrollers).
The realistic characteristics of the setup presented various difficulties that were related to the environmental factors, such as weather conditions, and thus made it difficult to establish signal stability. Additionally, the XBee S2C ZigBee module and the protocols it features do not provide a reliable means of establishing and acquiring links between nodes, as outlined in [18]. Therefore, conducting the experiments required following a certain procedure to establish the network's topology. This needed to take place in such a way that each of the network's links was symmetrical and achieved by transmitting messages from each node to all of its one-hop neighbors and accordingly waiting for replies. The previous was repeated three times after which the nodes exchanged the number of successful transmissions with each other. Based on the sum of the successful transmissions, the decision whether a node was listed as a neighbor or not was made. At the end of this procedure, the coordinator node collected the networks topology by combining the network nodes' replies accordingly into an adjacency matrix. The replies were collected through a simple echo algorithm at the end of a predefined time period, which was considered sufficient for the end of all the algorithm message exchanges.
However, since it was important to be able to correlate the specific time interval at which an event occurred, the dissemination message contained a value that acted as a step (hop) counter, which was initialized by the source node and increased with each transmission. The first step in both the probabilistic and the epidemic algorithm included a consensus clock synchronization [53]. The initiator node, which corresponded to the source node and coordinator node, based on the model's and ZigBee's description, respectively, broadcast a message to the entire network in which it declared its clock's value. The nodes then saved the time difference between their own clock and that of the initiator. In this way, when they replied with their time measurement value, they could adapt it, accordingly, to the clock of the initiator. This method did not take into account the time difference between the network's nodes, as it was assumed that it did not drastically affect the measurements.

Probabilistic Flooding Implementation
To achieve the probabilistic flooding among the network, a flooding message was sent by the initiator node to its neighbor nodes, in which the forwarding probability for the current experiment round was specified. Upon receiving the flooding message for the first time, the nodes of the network saved their time of coverage and added a unit to the current flooding message's total number of hops. They, then, sent a result message to the initiator node, in which they specified the time and message hops at the moment of their coverage.
The nodes, then, attempted to forward the flooding message to their neighbors, based on the probability of the received flooding message. The attempt was successful for each of the node's different neighbor based on rand() mod 100 < a, where rand()is a built-in Arduino random number generator and a is a positive integer, which was set to q * 100. The expression rand() mod 100 yields a random number between zero and 99, resulting in the probability of it being less than a or equal to q. If a node received the flooding message more than one time during the same run, it disregarded it.
After the reception of any result message, the initiator node saved its content in its internal memory. The flooding process terminated after a predetermined amount of time, that is the maximum time required for a blind flooding to reach all of the network's nodes.

Epidemic Implementation
Similar to probabilistic flooding, the SIR epidemic model implementation initiated when the initiator node sent the infection message to its neighbor nodes. Initially, every node was in the susceptible state, with the exception of the initiator node, who was infected. As stated in Section 3.2, the recovery rate from an infection was based on a constant recovery probability per unit time. This recovery probability γ was specified in the infection message along with the infection probability β.
The analytical expressions that described the behavior of the dissemination algorithms presented in Section 3 implied the presence of discrete time. In order to approximate the notion of time-steps, each node maintained a local countdown, starting from when an infection message was received. Thus, these "time-steps" were counted differently for each node; they were not globally synchronized. Once the countdown had ended, it reset, and the node first checked whether a recovery occurred. If a recovery did not occur, the node proceeded to infect some of its neighbors; both recovery and infections were determined by the probabilities β and γ, respectively, and were computed identically to the probabilistic flooding mechanism. The countdown length, corresponding to a time-step, was set to be significantly larger than the synchronization inaccuracy (five seconds), in order to minimize the latter's effects on the acquired results.
Upon receiving the first infection message, the susceptible nodes of the network entered the infectious state. They saved their time of coverage and added a unit to the current infection message's total number of hops. They, then, sent a result message to the initiator node, in which they specified the time and message hops during their infection. An additional result message was sent to the initiator node, when an infected node decided to become recovered, which contained the time and time-step of the recovery. After the reception of any infected or recovered message, the initiator node saved its content in its internal memory. The flooding process terminated after a predetermined amount of time, which was the maximum time required for an infection to reach all of the network's nodes.

Experimental Evaluation
The contribution of the work presented in this paper was the experimental evaluation of the dynamics of both the probabilistic flooding and the SIR epidemic model. In particular, the experiments aimed to evaluate the information dissemination capabilities of both algorithms under real conditions, with respect to the presented epidemic thresholds, as well as to examine similarities between them.
The experiments were conducted on CaBIUs, a university campus testbed using a 25 node WSN, as described in Section 4. Across all experiments, a total of 10 runs were conducted for any given selection of parameter values regarding q, β, and γ, with meant and 95% confidence intervals being reported, where appropriate. It should be noted that the initiator node was located at the same position across all experiments.
Based on previous experiments on CaBIUs [18], the value ofd was expected to be approximately equal to 5.76. Hence, the corresponding thresholdsq 1 andτ 1 were expected to be equal to 1/5.76 = 0.17. Due to the nodes' limited computational capabilities and the fact that the network's links changed dynamically, eigenvalues and average degrees were not computed during runtime. A set of predefined probabilities was used instead, with the probability q and reproduction number τ being set at the start of each experiment.
Probabilistic The total coverage for different values of q is depicted for the case of probabilistic flooding in Figure 2a. For q = 0.2, the total coverage C PF (T) was exceedingly low, with less than 10% of the network being covered by information. It was, therefore, evident that thresholdsq 1 andq 2 severely underestimated the probability q required to cover most of the network's nodes. On the other hand, for q = 0.6, a much larger percentage of the network was covered by the disseminated information, being 85.2%. Thus, the acquired results were more in accordance with thresholdsq 3 andq 4 . Similarly, Figure 2b showcases the total coverage C SIR (T) for the SIR epidemic model. The lowest value of τ for which experiments were conducted, which was 0.5, was considerably higher than the one indicated by thresholdsτ 1 andτ 2 . However, the network coverage even for such comparatively high values was only around 20%. In contrast, the highest coverage (80.4%) was achieved for τ = 1.6 (β = 0.8, γ = 0.5). This highlighted a significant difference between the theoretical model and the implementation investigated here and might be attributed to the fact that according to the implementation, a node may recover before ever infecting any of its neighbors.
Regarding termination times in terms of hops, they are displayed in Figure 3 with respect to probabilities q and reproduction numbers τ. In the case of probabilistic flooding, the lowest values for termination time occurred for q = 0.2, as the message being disseminated did not travel very far. For high values of q, probabilistic flooding functioned similarly to blind flooding, so the termination time should be closer to the source node's centrality (the maximum distance between the source node and any other node of the network). For intermediate values of q, messages did not always reach their destination through the most direct path, so the termination time could be higher. A similar, although less stable, behavior was observed for the SIR epidemic model, with termination times being lower in most cases. Due to the overall less stable behavior evident in both the coverage and termination time of the SIR epidemic model, it could be argued that probabilistic flooding was more suitable in applications that did not favor high variance in the information dissemination process. The network's coverage with respect to time in hops is depicted in Figure 4. For probabilistic flooding, the behavior of which is shown in Figure 4a, a large portion of the network's nodes were covered within a couple of time steps (e.g., in the range of t ∈ [2, 4]), which is often referred to as the transition phase. This range also seemed to be slightly wider the higher the value of q was. For the SIR epidemic model, on the other hand, the transition phases did not seem to occur at the same time or last as long, for different values of τ, as can be observed in Figure 4b. It is worth noting here that in a few cases, coverage did not monotonically increase with τ. This unexpected behavior could be attributed to a number of factors, such as the size of the network used for the evaluation and the difficulties encountered in networks that were operating under the conditions of a real environment (topology changes across experiments, transmission failures).  Figure 4. Comparison of coverage with respect to time t in hops between probabilistic flooding and the SIR epidemic model: (a) probabilistic flooding coverage C PF (t); (b) SIR epidemic model coverage C SIR (t).

Conclusions and Future Work
The progress in the development of the IoT has ignited scientific interest for the last decade. The success of the IoT is defined by the effectiveness of WSNs, which IoT is based on, in order to reinforce human comfort. As a result, a primary role is played by the information dissemination across such sensors, a process that was extensively studied and discussed and that could be implemented with a variety of techniques.
In this paper, the dissemination of information was achieved via probabilistic flooding and the SIR epidemic model by implementing the corresponding distributed algorithms. The similarities between probabilistic flooding and the SIR epidemic model were pinpointed, regarding both their behavior and their epidemic thresholds. The main contribution concerned the experimental evaluation of their similarities in CaBIUs, a university campus testbed. This network consisted of low-cost wireless sensor nodes that were installed across the campus of the Ionian University.
The experimental evaluation showed that the implementation was generally in agreement with previous analytical models. Especially in the case of probabilistic flooding, some of the considered epidemic thresholds could accurately predict the sufficient coverage of the network's nodes. For the implementation of the SIR epidemic model, the considered thresholds were shown to overestimate the network's coverage for a given set of parameters. This might be attributed to the fact that, according to the implementation, a node may enter the recovered state before infecting any of its neighbors. Therefore, despite commonalities between the thresholds of probabilistic flooding and the SIR epidemic model, the resulting coverage differed for the implementations described in this paper.
While the acquired results were promising, future work could aid in establishing their validity. More specifically: (i) a different implementation of SIR could be examined, to investigate whether recovery checks should occur only after an infection check has taken place; (ii) more thresholds could be considered in the conduction of additional experiments; (iii) different compartmental models could be compared, such as susceptible-infected-susceptible (SIS); and (iv) a larger number of nodes may be considered for experimentation, to capture the probabilistic nature of the two algorithms more reliably.
Author Contributions: All authors contributed to this project equally. All authors read and agreed to the published version of the manuscript.
Funding: This research received no external funding.