Evaluating Forwarding Protocols in Opportunistic Networks: Trends, Advances, Challenges and Best Practices

: A variety of applications and forwarding protocols have been proposed for opportunistic networks (OppNets) in the literature. However, the methodology of evaluation, testing and comparing these forwarding protocols are not standardized yet, which leads to large levels of ambiguity in performance evaluation studies. Performance results depend largely on the evaluation environment, and on the used parameters and models. More comparability in evaluation scenarios and methodologies would largely improve also the availability of protocols and the repeatability of studies, and thus would accelerate the development of this research topic. In this survey paper, we focus our attention on how various OppNets data forwarding protocols are evaluated rather than what they actually achieve. We explore the models, parameters and the evaluation environments and make observations about their scalability, realism and comparability. Finally, we deduce some best practices on how to achieve the largest impact of future evaluation studies of OppNets data dissemination/forwarding protocols.


Introduction
A class of networks where the devices connect and exchange data whenever they come in contact are called opportunistic networks (OppNets).As the name suggests, these networks get formed whenever the opportunity arises.Since an OppNet is formed by the availability of the devices in the connection range of each other, device-to-device communication technologies such as Bluetooth, ZigBee and WiFi-direct are used for communication in opportunistic networks.
Although it seems straight-forward to deploy OppNets, real testbeds to study and evaluate OppNets are quite uncommon.A typical testbed for OppNets consists of real users with their mobility patterns, traffic generation, buffer management in each device, a data dissemination protocol and post processing methods to obtain the necessary metrics.Although these modules can be developed and integrated in testbeds, the main challenge is in the repeatability of the experimental setup.Simply put, the dynamics of the network and, especially, the real users' mobility makes it impossible to repeat the experiment in the same manner again.Thus, simulators are the most common way of evaluating OppNets as they support the flexibility and ease of implementation.
One of the approaches in bridging the gap between real world and simulation-based evaluations is to attempt to precisely capture the properties of a real-world scenario and reproduce it in a simulator.This is often done by capturing various types of real world traces, e.g., mobility or application traces.

Applications of Opportunistic Networking Protocols
In this section, we first describe some of the notable application scenarios for OppNets.Then, we analyze them in terms of their node densities and size in general to extract relevant scenarios for evaluating forwarding protocols.
OppNets applications have already been studied several times, e.g., recently in [3].There, several applications have been proposed and discussed, which are also included here.The applications have been used from the papers in this survey, however the papers do not mention specific parameters for the used applications.Our main contribution is that we identify their realistic density properties.
The following list of applications is not exhaustive.On the one hand, it cannot be exhaustive, because OppNets have not been leveraged broadly by real applications yet-who was able to predict applications such as Twitter or Facebook before cheap, affordable Internet connections emerged?On the other hand, we attempt to cover the most prominent and promising scenarios here.Along with the descriptions of the applications, we also discuss their estimated sizes, which are further summarized in Figure 1.We have considered the information in the publications to decide the density scale.All these Future Internet 2019, 11, 113 3 of 26 publications state the applications their protocols are relevant for, but do not necessarily identify the application specific parameters.We take this a step further by trying to identify the realistic density properties that can serve these application areas.Our judgment on realism is based on our observations from the publications/expectations in the respective application area.
Group monitoring: This is probably the simplest and most straight-forward application for OppNets.It refers to a group of students or tourists, who stroll around together through a city or museum.There are mainly two application goals: to identify if somebody is missing (not connected to the group any more) and to send information in real time to all members.This application is in fact a step between mobile ad-hoc networks and opportunistic networks.It was prototyped by Saloni et al. [10].This application scenario can realistically scale up to 250 people at most and over a small geographical area.Even if the group moves, the relative area it covers remains small.Neighborhood/special interests: This scenario unites two different applications.First, neighbors would like to exchange information about local events, things to share, problems, discussions, etc.In this case, a well defined group of people, living close to each other (but not always connected), exchange data.In the second version of this scenario, another group of well defined people, such as the members of a sporting club, exchange data with each other.In comparison to the neighbors, the members of these groups are more spread, but still in relative proximity of each other.Since the application assumes that people know each other at least coarsely, the maximum number of people even in a large city can reach no more than 10,000 people over an area of a city such as Paris.We assume that, if larger cities are targeted, the application would rather split into several geographically isolated areas.
Special professions/rangers: This application resembles the last one, but assumes that the people are brought together by their special profession or role, e.g., firefighters, policemen or rangers.The number of such people is not random and depends on the population or area.Thus, the node density tends to be more stable or inflexible.This can be clearly seen in Figure 1 (red area).Another typical property of this scenario is that it has rather sparse node densities.
Campus/office: In this scenario, people working together exchange information, e.g., in an office or campus environment.In addition, here we can observe some dependency between area and number of nodes, but the general node density is higher than for the rangers scenario.
Concert/crowded event/disaster: Here, a crowded event such as a soccer game or concert is assumed.Such scenarios can be very dense, e.g., more than 50,000 people can fit into a large stadium.Of course, space is limited and some dependency between area and number of people/nodes inside can be observed.This scenario also includes the disaster case, as usually many people in small areas are affected and panic can cause further density.
Smart City: Here, people and devices in a city are connected together, e.g., street lights, traffic lights, parking lots, vehicles and pedestrians.These scenarios are relevant when the relative density is quite high to cover the complete city.From experience, smart city concepts are applied mostly to bigger cities with larger populations than to smaller towns.
Smart County: This scenario extends the Smart City application to even larger areas, such as San Francisco Bay Area or Berlin/Brandenburg Metropolitan Area.The relative density is similar or lower than for Smart Cities, as the county includes also less-populated areas.However, the absolute area and number of devices/people are greater.
Industry/Internet of Things: Another significant scenario for OppNets is the internet of things or Industry 4.0.Here, millions of devices are installed in relatively small areas (e.g., factories or warehouses) to fully control industrial processes.In this scenario, fewer people and more devices are present, which makes the relative density very high.
Disaster alert: This scenario is probably the one most referenced to in OppNets.Here, potentially many people over large areas need to be warned about an upcoming danger.The areas can be potentially as big as Kruger park (e.g., a volcano eruption, earthquake or big storm) and as small as a neighborhood (car accident).The number of people is also unlimited-from few people around an accident to the whole population of California.This application is not shown in Figure 1, since it would cover the complete area.
Take-home message: Similar to the conclusions in [3], we summarize that there are many different applications with great potential in OppNets, which cover different absolute areas and number of people/devices and relative densities.However, there are also some areas in Figure 1, which we consider non-realistic, i.e., very high densities (over 100 million devices per square kilometer) and very low densities (fewer than one device per square kilometer).While such scenarios are interesting from scientific point of view, experiments need to consider also the more realistic ones.

Methodology of the Survey
In this survey, we focus mainly on evaluation methodologies for OppNet protocols than on their details or taxonomies.We are interested in how they have been evaluated and compared to other protocols and whether there are some trends to be observed in the last five years.
The selection of the papers involved in this study are based on two factors: (i) The papers should be indexed in Scopus to ensure that they are peer-reviewed research works.(ii) The papers should describe pure opportunistic forwarding approaches.To be precise, we did not consider ad-hoc networks or protocols involving a combination of ad-hoc and infrastructure networks.The papers span from 2000 to 2018.Our list of papers is presented in Table 1 and the data from the papers used in this survey are presented in Appendix B. We do not claim to be exhaustive in our survey, simply because there are many works in this area and not all of them clearly position themselves in the opportunistic networks field.
The OppNets evaluation methodology includes typically the following components: • Simulation environment or frameworks (Section 4)

•
Comparative studies-which protocol has been compared to which others (Section 5) • Network densities (Section 6) • Mobility models (Section 7) • Cache sizes and traffic (Section 8) We consider these components as the most important ones, which on the one side dictate the quality and realism of the simulation studies and on the other side compare the protocols against each other.We do not consider theoretical studies or real-world studies, as those are rather rare in the area of OppNets.

Year of Publication Concise Description
Epidemic Routing [11] 2000 Flooding of all the messages to all the neighbors.
Fresh [12] 2003 Nodes having the most recent encounters with destination are chosen as forwarders.
SEPR [13] 2003 Nodes with the shortest expected path length to the destination are selected as forwarders.
Seek and Focus [14] 2004 Randomized forwarding and Utility-based forwarding are used based on encounter rates.
Spray and Wait [15] 2005 Controlled-replication. Spraying L copies of message M to relays and holding the final copy until the destination node is met.
MobySpace [16] 2005 Nodes having similar mobility pattern to the destination node.
Maxprop [17] 2006 Estimating highest delivery likelihood of neighbors to the destination based on frequency of encounters with the destination.
Spray and Focus [18] 2007 Spraying message copies to the neighbors and utility forwarding Prioritized ER [19] 2007 Estimation of routing cost to destination and prioritization of packet bundles to transmit based on routing cost.
HiBop [20] 2007 Contact history and stored context information about neighbors are used for selection of forwarders.
Geo-Opps [21] 2007 Using geographic location of the destination and prior known route schedules of destination and encountered neighbors.
Propicman [22] 2007 Estimating the probability of meeting the destination by sending probes to two hop neighbors.
Utility-based Spraying [23] 2007 Nodes choose between different utilities such as most mobile first, most social first, etc. for selection of relays.
Rapid [24] 2007 Adaptive replication of copies based on a chosen utility metric such as minimizing average delivery latency, minimizing maximum latency, etc.
ORWAR [25] 2008 Controlled-replication such as spray and wait routing combined with prioritization of messages by choosing required utility metrics.
EBR [26] 2009 Replication-based and selection of forwarders based on the number of encounters with the destination.
CAR [27] 2009 Evaluating and predicting context information such as mobility locally and by sending updates to neighborhood, using delivery predictability for relay selection.
SimBetTs [28] 2009 Relay selection based on social analysis on betweenness centrality, similarity index and strength of ties between nodes.
FairRoute [29] 2009 Tie strength and social status of nodes to assist in forwarding decisions.
Prophet+ [30] 2010 Enhancements to delivery predictability by considering node's buffer, power, location, popularity along with the delivery predictability obtained from prophet.

Year of Publication Concise Description
Prophet v2 [31] 2011 Fine-grained encounter rates by taking into account the unsuccessful intermittent connections, thus increasing the reliability of delivery predictability.
BubbleRap [32] 2011 Social and structural properties such as centrality and community metrics are used to select forwarders.
R3 [33] 2011 Estimates end-to-end delays of different paths to the destination and selects the path with best replication gain, uses adaptive replication.
3R [34] 2011 Estimates encounter probability by prediction of fine-grained regular encounters pertaining to a time window in a given day.
dLife [35] 2012 Daily routines of users are used to increase the accuracy of predicting future social contacts.
Sprint [36] 2013 Predicting the future contacts over a given time by analyzing the mobility patterns and using additional information on social contacts.
SGBR [37] 2013 Identifying social groups based on frequent and longer contact durations, social properties used to route packets between different communities.
Scorp [38] 2013 Prediction of probability of encounters having content-specific interests among the neighbors with similar social interests and daily routines.
HBPR [39] 2013 Exchange of location updates between nodes and predicting the direction of future movements to find shortest paths to the destination.
Onside [40] 2014 Community identification based on similar social interests and exchange of interests table between nodes to chose forwarders.
JDER [41] 2014 Targets high reachability by giving preference to selecting nodes connected to multiple communities (cut-nodes) as forwarders.
GAER [42] 2014 Genetic algorithm based next hop selection, distance between mean of home locations of neighbors to destination along with a fitness function is used to select forwarders.
PRoWait [43] 2015 Uses delivery predictability of Prophet to select forwarders combined with spray and wait routing.
Pathsampling [44] 2016 Learns network topology by using probes sent along with the beacons and selects forwarders based on estimated end-to-end delivery probability.
CGrAnt [45] 2016 Local information, situational information and domain information is used to select forwarders.
ABCON [46] 2016 Relay nodes are selected by the number of encountered neighbors.
EER [47] 2016 Calculates expected encounter value for every neighbor by using contact durations and frequent contacts which determines the forwarders and the number of copies.
RPC [48] 2016 Estimates reachability probability computed considering centrality measure and encounters rates, which is used in selecting forwarders.
EDR [49] 2016 Encounter rates with the destination and estimated distance to the destination are used to select forwarders.
GSAF [50] 2016 Destination dependent identifier is used to spray messages to forwarders.After the message is in the locality of the destination, it is flooded to all neighbors.
HPR [51] 2016 Delivery predictability of nodes with spraying is the forwarding strategy.
E2FA [52] 2016 Delivery predictability of nodes and buffer utility are used for choosing forwarders.

Protocol Year of Publication Concise Description
Multi-S&W Routing [53] 2017 Spray and wait routing with next hop selection based on weighted sum of betweenness centrality, friendship index and similarity index.
FGAR [54] 2017 Adaptive replication and forwarding based on contact prediction and success probability of delivery between contacts in a given time interval.
IBR [55] 2017 Effects of interactions between nodes in terms of popularity without detection of communities is the forwarding strategy.

FC-DFCM [56] 2017
Relationship strength of node pairs and contact durations are used in selection of forwarders.
EIMCT [57] 2017 Social network based contact durations are considered for forwarding decisions.
SAPR [58] 2017 Social characteristics of nodes and their mobility are the factors for making forwarding decisions.
CPR [59] 2017 Prediction-based routing decision considering statistical contact information, contact transitivity and instant contact information.
TCCB [60] 2017 Temporal social contact patterns and temporal centrality prediction are used to select forwarders.
FARS [61] 2018 Fairness based routing involving weighted factors of contact duration, residual buffer and historical amount of delivered data.
Predict and Forward [63] 2018 Next hop selection based on node profiles and attributes along with historical encounters to calculate delivery probability.
IoR [64] 2018 Average time since the message generation, average distance travelled in hops and delivery predictability of ProPhet are used to select forwarders.
CAOF [65] 2018 Relay selection is based on node's activeness to meet nodes in its own community and different communities within a bounded time.
CAF [66] 2018 Adaptive weighted combination of friendship index, similarity index, centrality, contact strength and trust to find the suitable relays.
CbR [67] 2018 Nodes belonging to clusters with good delivery capability are preferred than just the node's utility value.
RBES [68] 2018 Congestion level at a node and contact history of nodes are together used for forwarding strategy.
kROp [69] 2018 Partition of neighbors to clusters and next hop is determined by an evaluation function to find the cluster with optimal delivery capability.
PBQ [70] 2018 Delivery probability is calculated by Poisson based distribution along with consideration of node's daily routines and mobility patterns.
MLProph [71] 2018 Enhanced delivery predictability of ProPhet with machine learning by considering node's popularity, power consumption, speed, location and frequently encountered nodes.
EPSoc [72] 2018 Social-based epidemic routing where degree centrality of nodes is used for next hop selection.
CoSim [73] 2018 Cosine similarity of the data packets between nodes are used to find the similarity of the nodes which in turn are used to forward messages.
CGR [74] 2018 Replication based on scheduled contact patterns with predetermined network prediction.

Simulator Environments
The first step in evaluating the performance of an OppNet protocol is the selection of the simulation platform to use.In general, there are two main options: custom built simulators and standard simulators.The standard simulators usually used for OppNets are ONE [75], OMNeT++ [76], ns-2/ns-3 [77], MobEmu [78], and Adyton [79].For both custom-built and standard simulators, there is also the option of providing the original code for reproducibility purposes or not.
Figure 2 shows which network simulators were used in all explored studies and in the recent ones since 2013.A tendency can be observed that recent studies use more standard simulators, especially the ONE, and fewer custom built ones.This is surely a positive trend, as it helps towards re-usage of models and reproducibility.There are various reasons one might argue against and for custom-built simulators.One possible reason is speed and optimization.Standard simulators are typically of general use and tend to be slower.This potentially limits the scale of the OppNet simulation.Figure 3 shows the correlation between number of nodes (simulation scale) and whether a standard or a custom simulator has been used.This figure shows that in fact the largest simulations with 12,000 and 64,000 nodes have been conducted on custom simulators.However, these are rather outliers and most of the custom-built simulators have been used for much smaller network sizes.In addition, standard simulators have been used for studies with 1000 and 2000 nodes.

-2018
Another argument for implementing custom-built simulators has been the learning experience and the deep understanding of the built system.However, self-built solutions tend to be simpler and ignore many important properties.Thus, an argument we find in various references [80] and from speaking directly to the authors is that, while a deep understanding and mastering of the used environment is surely of great importance, learning from well-designed and carefully tested already existing simulators is probably the better choice.An important question is which simulator to choose.We compared the scalability and availability of simulation models in an earlier work in [8], where we concluded that the ONE has the largest variety of simulation models, while C++ based simulators like OMNeT++ or Adyton are much faster.Our current work is partially focused on further improving the scalability of our own OMNeT++ based OppNets framework OPS [81] and increasing the number of available models there.Finally, it does not matter which simulator is selected, as long as the original code is made available to the research community and the evaluation methodology is carefully designed.
Unfortunately, only one of the studies point to where their source code can be found: CGR [74].Surely, some of them have found the way and became part of the ONE or other simulators (e.g., MaxProp, Epidemic, BubbleRap, SimBet, and ProPhet), but this is rather the exception.Reproducibility is highly important for fostering research in general and providing the used source code would be an important first step.
Take home message: Standard simulation environments are to be preferred against self-built ones and source code should be made available to the research community.

Comparative Studies
One of the most important decisions when designing an evaluation study is whether to compare the new protocol against some existing ones.The researcher potentially weakens her protocol and its chances to be published if no such comparison is provided.However, comparing against many different protocols also does not make a lot of sense and costs a lot of efforts, since most of the protocols are not readily available.
Thus, researchers often opt for a tradeoff, where they compare their new protocols against available and well explored ones.In the case of OppNets, these protocols are typically Epidemic, ProPhet, Spray & Wait, MaxProp and BubbleRap, as shown in Figure 4.It can be seen clearly in Figure 4 that the most widely used protocols for comparison are Epidemic, Prophet, Spray & Wait, Maxprop and BubbleRap and this trend stays the same even in the recent studies during 2015-2018.
We also present these results extensively in Figure A1 in Appendix A, where we have marked all protocols which have been used for comparison from other protocols.For example, the SEPR protocol from 2003 (Line 3 in the figure) has been compared against Epidemic.It can be clearly seen that freely available implementations of protocols (marked blue in the figure) are preferred options to compare against.Furthermore, the matrix in Figure A1 gets "thinner" towards the right bottom corner, while it should be actually filled under its diagonal.
This comparison strategy has one important disadvantage: it does not allow the research community to evaluate newer protocols against each other.For example, both Predict & Forward from 2018 and PathSampling from 2016 were compared against Epidemic and Spray & Wait.However, which is better or to be preferred in which situations?They are obviously close enough in their application scenarios to each other, since both selected the same set of comparative protocols.The observations from this section lead us to only one possible way out of this problem: new protocols need to be implemented in standard simulators and their code needs to be published.This is the only possibility to enable future studies to use the most recent advances in the area.This would also have the positive side effect of researchers reproducing and confirming the results of each other and thus pushing the state of the art forward faster.Comparative studies against traditional protocols such as Epidemic can still be very valuable to explain better the new properties or to set the new protocol in context.

Growing trend in years
Take home message: New protocols need to be compared against recent advances and their code needs to be freely available.

Scalability
Scalability in OppNets is directly related to the network size, i.e., the number of nodes in the network.The network size considered in the studied protocols is represented in Figures 3 and 5. Most of the studies have been evaluated for a network size of fewer than 100 nodes.Very few evaluations have considered more than 200 nodes and the trend declines rapidly for 500 nodes, 1000 nodes and above.In one of the studies (Fresh [12]), the network size ranged 1000-64,000 nodes.In studies where mobility traces were used (see also the discussion in the next Section 7), the network size was limited by the number of nodes in the traces.We discuss realistic network densities and number of nodes in Section 2 above.Table 2 presents an overview of the here discussed protocols and the densities they have been tested for (Figure 1).Not all protocols provide enough details to evaluate the network density; some do not even provide the network size.Many of the papers refer to some of the application scenarios in Section 2, but the used parameters settings are rather small and thus not very realistic.
However, it needs to be noted again that many simulators (refer to Section 4, Figure 3 and our previous work in [8]) do not cope well with large simulations.Nevertheless, simulations with approximately 500-1000 nodes can be considered realistic and are well supported by all simulators.There are also good mobility traces at this network scale (refer also to Section 7).
Take-home message: Large simulations with thousands of nodes are still hard to achieve nowadays.This is one of the most important challenges and goals for OppNet modeling.New studies should consider the targeted application scenarios (Figure 1) and target at least 500-1000 nodes.Some special application scenarios, e.g., group monitoring applications, might require a custom scale.

Mobility
Mobility is probably the most important simulation model for evaluating OppNets.It is the main driver of OppNets and how messages get diffused in the network.Mobility (or its absence) controls which nodes are meeting each other and for how long.
We explored the different mobility models for OppNets in our earlier survey [8].Generally speaking, we identified three groups of mobility models:

•
Random mobility traces use analytical models to describe the mobility of devices/people in OppNets.They are simple, fast, but very unrealistic.Examples include Random Waypoint, Random Direction, etc.
• Real trace mobility models gather real GPS or other location data from real users and replay them in simulation.They are very realistic, easy to implement, but slow.Furthermore, gathering the traces is a very tedious task and there is no way to increase the number of nodes later.There exist a well-known database with such traces, called Crawdad (https://crawdad.org).
• Hybrid models combine both worlds by extracting statistical data or observations from real traces and then implementing a randomized model based on those.They are faster than real traces and more realistic than random models.However, they also quickly become very complex to understand and implement.It is also very hard to grasp all behavioral observations in one model.Examples for such models include SWIM (Small Worlds in Motion) [82], HCMM [83] or TRAILS [84].
The properties we are looking at for mobility models are: • Scalability: How many nodes can a model produce/simulate?Random and hybrid mobility models are not really restricted in their scalability: as many nodes as needed can be simulated.However, traces are limited to the maximum number of nodes they have been collected for.
• Realism: How realistic is the behavior of the moving nodes?Real traces are clearly real.Random models are least realistic, while hybrid models tend to have more realistic properties.
• Generalization: How general can the results be considered?A single real trace is a snapshot and thus not representative.Analytical (random and hybrid), when used for a large number of scenarios and parameters, can be considered representative studies with statistical significance.
The properties of the three families of mobility models are also summarized in Figure 6 (left).Figure 6 shows how different mobility models vary in degrees of their relevance in terms of generalization, realism and scalability.For instance, a single trace is highly realistic, although neither highly scalable nor relevant in terms of generalization, as it adheres to a specific scenario.Until recently, the models have been used exclusively, i.e., only one model was applied in a particular simulation.However, other approaches are also possible: running simulations with several mobility models separately and together.The first idea is quite straightforward: a simulation is run first with mobility Model A, then with Model B, Model C, etc.All results together are used to derive the performance of the explored protocol.
The second idea is more complex and has not been applied yet.Here, individual traces are not run separately from each other, but on top of each other.The coordinate systems of the individual traces need to be converted to match.In this way, a much more scalable and dense simulation is possible, which makes the performance evaluation scalable and general at the same time.The effect of these modifications is also shown in Figure 6 (right).The only remaining challenge is performance of the model itself, as trace-based models are known for their slow performance.
The idea of using several mobility models at the same time is not exclusive for traces.It is also very useful with hybrid models, where for example some nodes move similar to vehicles and others similar to pedestrians.This has been for example applied in GSAF [50], kRop [69] and MLProph [71] where bus movement is used for vehicles and pedestrians use shortest path map based movement.
Figure 7 summarizes the used mobility models in the here explored studies.There is a slight trend towards using more than two traces or at least one trace and one analytical model.This is a very positive trend, as it makes the studies more representative.Figure 7 summarizes the used mobility models in the here explored studies.There is a slight trend towards using more than 2 traces or at least 1 trace and 1 analytical model.This is a very positive trend, as it makes the studies more representative.

Take home message. Using realistic and scalable mobility models is crucial for performance evaluation.
A good option is a sophisticated hybrid mobility model, with a large variety of scenarios and parameters settings.Using a benchmark-like evaluation with several traces and hybrid mobility models is even a better choice.

Cache Size and Traffic
The second most important parameter of the performance evaluation of OppNets, after mobility, is the amount of messages in the network.Usually, this parameter is driven by four independent other Take home message: Using realistic and scalable mobility models is crucial for performance evaluation.A good option is a sophisticated hybrid mobility model, with a large variety of scenarios and parameters settings.Using a benchmark-like evaluation with several traces and hybrid mobility models is an even better choice.

Cache Size and Traffic
The second most important parameter of the performance evaluation of OppNets, after mobility, is the amount of messages in the network.Usually, this parameter is driven by four independent other parameters, i.e., cache size, traffic generation, time-to-live of messages and simulation time.However, the explored studies do not homogeneously explore those parameters in the same way, thus an analysis is not possible.Instead, we analyze the network-wide traffic in number of messages per hour against the cache size (in number of messages).The results are provided in Figure 8.Not all studies provide enough details, thus only a subset are depicted in the graph.In the same graph, we have marked three different areas: high, medium and low cache pressure.
In low cache pressure scenarios, the generated traffic will never exceed the cache size and thus no buffer drops will be observed (unless time-to-live of messages is used).In medium cache pressure scenarios, buffer drops will occur, but rarely and will likely not affect the work of the protocol significantly.In high cache pressure scenarios, however, buffer drops will dominate.
Ideally, any protocol should be evaluated in all three scenarios to cover all possibilities.The protocols which get close to this goal are ORWAR and MaxProp.In Figure 8, we have also marked newer protocols (since 2013) in grey.We can observe that many of these newer studies have considered high to medium cache pressure scenarios.Older protocols often do not provide enough details about their traffic generation patterns, which is the reason why they are missing from this graph (few white boxes).
The most often used traffic generation frequency is by far 25-35 s.This comes from the standard settings of the ONE simulator.Thus, it is highly important how simulators are configured by default, since many researchers use these parameters under the assumption that they are somehow more "correct".We consider this risky, as often such default settings are chosen either at random or are historically motivated (old settings from other scenarios).The traffic pattern and frequency depends highly on the scenario used, ranging from few messages per day for human-generated messages to hundreds of messages per second for sensor-generated ones.Take home message: Cache sizes and traffic have not been considered very seriously in existing studies, taking default values.However, the parameter settings need to be well adjusted towards the application scenario taken into account.

Metrics
In this section, we briefly discuss the mostly commonly used metrics for all the protocols in the surveyed papers and the potential to identify new metrics specific to an OppNet.
Delivery ratio is the ratio of total number of packets delivered to the respective destinations to the total number of generated packets in the network.Delivery delay is the time difference between generation of packets and the reception of the packets at the destination.Delivery cost mainly attempts to capture the impact of replication on the performance of the OppNet.Hence, the delivery cost is usually measured in terms of the overhead such as number of copies or number of transmissions.
Specific analysis with respect to the metrics are not presented here owing to the fact that all the protocols in Table 1 utilize all three of the above mentioned metrics.The enhancements in the protocols always involve a trade-off in the metadata exchanged between nodes prior to the actual message transfer.The metadata comprises all the data stored in each node for choosing relays such as rate of encounters, mobility patterns, profiles of nodes for predicting future contacts and daily schedules.As this certainly involves the necessary information exchange between nodes upon contact, it can be considered as pre-processing overhead and should be accounted for in evaluations.Moreover, the pre-processing overhead depends on the operation of the protocols themselves.Hence, it is essential for the authors to identify all such possible overhead in evaluating their respective OppNet protocols.

Holistic Guide to OppNets Evaluations
In each of the previous sections, we have identified the most important insights in take-home messages.Here, we summarize our findings from the trends observed in the OppNet evaluations and further identify few more general properties and challenges in evaluating OppNet protocols.
First, and most importantly, performance evaluations have been improving over the last years significantly, with larger and more realistic scenarios being evaluated, and more sophisticated models being applied.Nevertheless, there is room for further improvement.
The best practices can be summarized as: • Select a standard simulator.We recommend ONE or OMNeT++, as those are the most well documented and actively developed ones.
• Select OppNet protocols to compare against.They should be close in their general application scenario (destination-less or destination-oriented, etc.) and should be recent, e.g., from the last five years.Additionally, compare against optimal solutions.This combination ensures the correct positioning of the new protocols into the context of existing ones and how to progress the state of the art.
• Design a good application scenario with realistic number of nodes, traffic, cache sizes and simulation time.
• Select a good mobility model, able to cater for the application scenario and its scale.Traces (see also the discussion below) or recent hybrid mobility models are a good option.If using traces, use at least 3-5 different traces.Best, use several hybrid models and several traces.
• Explore the relevant protocol specific metrics in terms of overhead.
• Explore the parameter space of your scenario from minimum to maximum possible values.Report on confidence intervals.
Additionally, we would like to summarize our findings in the form of future work suggestions.Suggestion 1: Open source code.The most important and significant improvement is the easiest and the most difficult to enforce at the same time: open source code.If all researchers would publish their code and make it available for other researchers to use and validate, the whole community will profit and the development will speed up significantly.It sounds easy, but there are also counter arguments.Publishing the code means more work to tidy it up, check for errors, comment and make sure it compiles correctly out of your own machine.This is a step which many researchers skip to save time.Companies are particularly reluctant to share their code, as they believe it would give too many details into possibly patented or copyrighted products, although the probability of actually patenting or bringing the code to market is very low.It is the task of publishers and editors to enforce this development and many communities have started requesting code and other supplementary materials as part of their publication process.
Suggestion 2: Documentation and default parameter values.Another important insight is the usage of default parameter values and available models in standard simulators.It is obvious from our discussions that researchers tend to use readily available models and pre-defined values, instead of designing them from scratch and with only quality in mind.This is due to two reasons: First, time restriction is again an issue.Implementing and testing a new model is a time-consuming task and the goals and advantages of doing this are not always obvious.This problem can be easily resolved with Suggestion 1.Second, the complexity of standard simulators and their models can hinder especially less experienced researchers in how to setup them correctly.Here, only a combination of the following can help: rigorous documentation; well-designed default values; design of benchmarking suites; and a lively, helpful community, ready for open discussions.
Suggestion 3: Benchmarks.Following from Suggestion 2, the development of benchmarks is highly advisable for the further development of the community.Many other communities have well designed and adopted benchmarks, which significantly simplify the development and testing of new algorithms and protocols on one side, but also in better judging their quality and true novelty, on the other side.
Suggestion 4: Improved mobility models.One of the most challenging parts of OppNet simulation is the mobility model.Existing models suffer from various problems and no real solution is in sight.One possibility is the above described benchmarking process: e.g., combining many different traces into one benchmark.In Section 7, we have also described an idea of how to put different traces "on top of each other" to run simultaneously.Suggestion 5: Faster simulations.Research is also desperately needed to speed up simulations.Currently, it takes up to weeks to run a single large scenario.This is due to the level of detail in which simulations are currently executed, typically at the packet level.Some new approaches already exist, e.g., in FALCON [85], but require a lot of manual adaptation.It is desirable to develop a highly scalable and fast simulation environment, where protocols are still implemented in the same way as in the real world.
These suggestions build also the basis for our own future work in the area of OppNets, even if not all of them will be addressed soon and at the same time and the community is welcome to support us.

Conclusions
In this paper, we have focused on performance evaluation of opportunistic networking data dissemination/forwarding protocols.Differently from other surveys, we have discussed how the protocols were evaluated instead of what they actually do or whether they perform well.This study has led us to two main outcomes: a best practice evaluation process and a list of suggestions on further improving the process for the whole community.

Appendix B
Here, we have consolidated the data regarding OppNet evaluations obtained from the papers used in this survey.

Figure 2 .
Figure 2. Used evaluation environments: for all surveyed publications (top); and for all recent studies since 2013 (bottom).

Figure 3 .
Figure 3. Simulation scale for different simulation environments, for all papers in our surveys which declare the number of nodes.

Figure 4 .
Figure 4. Comparative studies of the surveyed OppNet protocols with the most compared OppNet protocols and their trends.

Figure 5 .
Figure 5. Frequently used network size in the studies.Very large outliers are not shown for better visibility.

Figure 6 .
Figure 6.Properties of standard mobility models (left); and extension opportunities for trace-based models (right).

Figure 6 .
Figure 6.Properties of standard mobility models (left) and extension opportunities for trace-based models (right).

Figure 7 .
Figure 7. Used mobility models in the here explored studies.There is a slight trend towards using more than 2 traces or at least 1 trace and 1 mobility model.

Figure 7 .
Figure 7. Used mobility models in the here explored studies.There is a slight trend towards using more than two traces or at least one trace and one mobility model.

Figure 8 .
Figure 8. Cache sizes against network-wide traffic.Not all studies provide sufficient details, thus only a subset is depicted.

of Things Smart County Smart City Group Monitor ing Figure
1. OppNets applications and their node densities.Six different node density areas have been identified and marked with dotted lines.

Table 1 .
OppNet protocols involved in this study.

Table 2 .
Node densities for studied protocols (missing protocols do not provide enough information).
Comparative studies between the protocols surveyed in the paper.Protocols readily available in simulators are in marked blue.