Improving the Performance of Opportunistic Networks in Real-World Applications Using Machine Learning Techniques

: In Opportunistic Networks, portable devices such as smartphones, tablets, and wearables carried by individuals, can communicate and save-carry-forward their messages. The message transmission is often in the short range supported by communication protocols, such as Bluetooth, Bluetooth Low Energy, and Zigbee. These devices carried by individuals along with a city’s taxis and buses represent network nodes. The mobility, buffer size, message interval, number of nodes, and number of messages copied in such a network inﬂuence the network’s performance. Extending these factors can improve the delivery of the messages and, consequently, network performance; however, due to the limited network resources, it increases the cost and appends the network overhead. The network delivers the maximized performance when supported by the optimal factors. In this paper, we measured, predicted, and analyzed the impact of these factors on network performance using the Opportunistic Network Environment simulator and machine learning techniques. We calculated the optimal factors depending on the network features. We have used three datasets, each with features and characteristics reﬂecting different network structures. We collected the real-time GPS coordinates of 500 taxis in San Francisco, 320 taxis in Rome, and 196 public transportation buses in Münster, Germany, within 48 h. We also compared the network performance without selﬁsh nodes and with 5%, 10%, 20%, and 50% selﬁsh nodes. We suggested the optimized conﬁguration under real-world conditions when resources are limited. In addition, we compared the performance of Epidemic, Prophet, and PPHB++ routing algorithms fed with the optimized factors. The results show how to consider the best settings for the network according to the needs and how self-sustaining nodes will affect network performance.


Introduction
The Internet of Things (IoT) is an emerging paradigm concerned with bringing the connectivity of real-world objects and things [1]. Such a situation opens up opportunities for a large number of various devices or things, such as wearable devices, laptops, portable devices, and vehicles, to impart, communicate, and interact with one another. Some applications include, yet are not restricted to, smart healthcare [2], smart cities [3], smart environmental monitoring systems [4], and Smart Business [5]. In such sophisticated scenarios, there is the possibility of finding heterogeneous static and mobile devices (e.g., smartphones carried by individuals) equipped with different radios enabling data transmission that might interact. The communication might occur only during specific contact opportunities (i.e., depending on the communication protocol and range of coverage) between heterogeneous and possibly disconnected static networks [1]. In a smart city scenario, due to the high mobility and flexibility, the mobile sinks (e.g., cars, taxis, and buses) might be utilized to collect the data from the static nodes (e.g., traffic sensors, environmental monitoring stations) or disseminate control information.
Hence, such data might be relayed by any node and forwarded through other nodes (e.g., via smartphones) even in the absence of a predefined end-to-end path between models. As the output, we found the optimized parameters to estimate the network performance using networks with limited resources and the simulation output.
The results showed that the network structure influenced by the node density and node mobility impacts the optimized parameters and the network performance. We also predicted the impact of malicious nodes in the network and considered the necessary security measures. The contribution of this paper is as follows: • Performing a comprehensive evaluation of the network's performance and the impact of the most influential parameters, such as the number of message copies, the node's buffer size, messages interval, number of nodes, and node movement, on the network performance allocated in terms of messages delivery probability, dropped messages probability, and network overhead; • Applying ML in OppNet in large networks extends the simulation results for predicting the effective, optimized parameters; • Specifying and customizing the appropriate optimized parameters in real-world applications with limited resources; • Constructing a database based on the movement of buses in Münster, Germany; • Comparing the network performance under three different routing algorithms in the presence and absence of selfish nodes in three real-world nodes' movements.
The rest of the paper is organized as follows: In Section 2, related works are reviewed. Materials and methods are discussed in Section 3. Section 3.4 describes the optimized parameters. Then, Section 4.3 presents the simulation results, and Section 5 discusses the results. Lastly, Section 6 presents the paper's conclusion.

Related Work
Exploring the impact of influential factors such as message copies, buffer size, message interval, and node mobility on network performance is challenging. Considering each of these factors as a dimension, simultaneous multidimensional evaluation of the impact of all factors is complex and less known. Thus, the majority of the related work is focused on one or two-dimensional evaluation. This section presents some existing studies on these challenges. We have outlined their main characteristics and specified the results obtained from each. These papers evaluated the boundary of the number of nodes, messages, buffer size, and node mobility on network performance utilizing the most prevalent routing algorithms. Additionally, regarding the number of available copies of messages, solutions to improve network performance have been proposed in addition to examining its impact.
The effect of number of message copies in the network are studied in [26][27][28]. The authors in [26] evaluated the effect of the number of message copies in two different areas in terms of delivery probability, average latency, overhead ratio, hop count, average buffer time, and the number of contacts. According to their results, the lower-size region showed better performance. The number of nodes in this paper (25 pedestrian and 25 cars) are not enough for the wide-area selected for simulation. Additionally, a concrete node movement was not chosen for the simulation.
In [27], message distribution on the network is examined, and the number of message copies is controlled comparatively. The authors aimed to increase the message delivery rate and reduce network overhead and delay. The structure estimated the probability of a successful message leaving the intermediate node and the extent of the network to determine the replication and forwarding strategy. For simulations, the authors did not consider any specific circumstances.
In order to control the number of message copies in the network, the authors in [28] suggested a feedback mechanism. When the destination node receives a message, it sends a feedback message to all other network nodes such that they can delete the message from their buffer. It could improve the message transmission success rate, overhead, and delay. Nodes in this research move randomly within the simulation environment, which is considered small (400 × 400).
The authors of [6] considered the number of nodes, messages, and message size to check their impact on the network performance in OppNets. Epidemic, Prophet, MaxProp, and Time to Return routing algorithms are used for this evaluation. The results of the network performance of each of these routing algorithms are compared. The simulation environment in this paper is also considered small (zone1 = 700 × 600, zone2 = 50 × 50).
The effect of buffer size on the OppNets' performance is evaluated in [22,23,29]. Likewise, the effect of increasing the buffer size in different routing algorithms is studied in [29]. The authors considered the number of vehicular and terminal nodes to change the buffer size in this paper. As the buffer increased, the performance of nodes in the Epidemic and MaxProp algorithms also increased, and only vehicle nodes improved in the Spray and Wait algorithm. The number of nodes in this paper was limited to 25 stationary nodes and 6 mobile nodes.
In [22], the effect of buffer size on the Epidemic algorithm is evaluated, and an optimal buffer scheme is presented to improve the efficiency of this algorithm. In this paper, nodes move randomly in a square of 1000 m × 1000 m. The proposed algorithm improved the message delivery and end-to-end delay in the Epidemic algorithm.
The authors of [23] examined the effect of buffer size on packet delivery and showed how buffer size could affect the network performance. They also evaluated the inverse effect of buffer size on network overhead and packet delivery rate. As the buffer size decreases, the delivery rate and network overhead increase, and with a large buffer size, the proposed algorithms have better message delivery and less network overhead. In this paper, the authors did not take a specific movement situation into account.
The effect of message generation interval and buffer size on the message delivery probability, delay, and network overhead were studied in [30]. This paper compared the results for OBSBM (their proposed algorithm), Epidemic, Binary Spray, and Wait routing algorithms. The results show that by increasing messages interval, the message delivery increased, messages delay decreased slightly, and network overhead increased a little. Additionally, increasing the buffer size from 20 to 40 messages does not affect the network performance.
Node density is evaluated in [31][32][33]. The authors of [31] evaluated the effect of node density and messages TTL on network performance in vehicular delay-tolerant networks in terms of messages delivery probability, network overhead, average latency, and the average number of hops. The results showed that raising the density of the nodes enhanced network performance; on the other hand, it may increase the delay. They also showed that increasing TTL did not affect improving network performance. The authors of this paper concluded that a single-copy protocol has more hops and latency than multiple-copy protocols.
The effect of node density on different routing algorithms was evaluated in [32]. The environments they evaluated included extremely sparse environment (3-5 nodes per km 2 ), sparse environment (6-15 nodes per km 2 ), average environment (16-25 nodes per km 2 ), populated environment (26-400 nodes per km 2 ), and dense environment (more than 400 nodes per km 2 ). The results showed that the Spray-and-Wait algorithm works better in a dense environment. TheRandom Waypoint movement is also used in this article for node movements.
The effect of node density on message delivery probability, latency, and network overhead is examined in [33]. Based on the paper results, by increasing the number of nodes, messages delivery probability and network overhead are increased, and message latency decreases. The authors employed the random movement model in this paper without taking into account the actual database. Additionally, they made the supposition that every node in the network was trustworthy and free of malicious nodes.
In [34], an OppNet with fixed and moving nodes was examined. This paper showed that the mobility of the nodes does not have much effect. Moreover, increasing the number of moving nodes can increase efficiency, which was not seen in fixed nodes.
The impact of malicious nodes on the network was evaluated in [24,25,35]. The authors of [24] analyzed the impact of selfish nodes on the network performance and proposed a routing mechanism to manage such nodes in the network. They discussed that increasing the number of selfish nodes can effectively decrease the network performance.
In [25] the effect of selfish nodes on network performance based on data memory size is estimated. The authors took advantage of social knowledge to detect the selfish nodes to mitigate this destructive effect of selfish nodes.
The authors of [35] analyzed the impact of the malicious nodes on messages delivery probability, dropped messages, and average latency in the network. The outcomes of this paper indicate that by increasing the number of malicious nodes in the network, the message delivery decreases, the number of messages dropped increases, and the message latency increases. The simulation environment was limited to 1000 mt × 1000 mt in this research, the authors did not use actual movement data set, and they considered a large buffer size (100 M) for nodes.
The nodes' mobility, buffer size, message interval, number of nodes, and number of messages copied in OppNets influence the network's performance. The main disadvantages of the previous works are as follows: • The majority of the previous works suffer from the lack comprehensive evaluation method in which they often consider a limited number of the influential factors on the network performance (ca. two); • These works usually do not use the actual dataset collected from the real-world scenarios. Therefore, the works are restricted to a limited number of nodes which do not reflect the actual output in the simulation; • Lacking the actual data from the actual scenario forces the authors to use the random node's movement in the environment. In some cases, such as pedestrians (carrying the wearable devices such as smartphones and wearable devices), this could be a valid assumption, but in many other cases, such as, for example, an individual riding a bus or taxi, the requirements cannot be met.
In this work, we conducted a comprehensive evaluation study by considering all influential factors in network performance. Additionally, we have used concrete datasets to make the results less error-prone.

Materials and Methods
To show the network resources' limitations in real-world applications, we used three datasets collected the GPS coordinates of taxis and buses in public transportation in three different cities. The datasets differ in network structure, mobility, and density coping with the requirements of our study.
The rest of this section describes the databases and routing algorithms used in this paper, the simulator environment, and finally, how optimized parameters are determined.

Scenarios
Providing a model requires data based on individual daily activity and movement, but such data are often unavailable in large sizes. During daily mobility, people make different decisions in various situations and may choose routes based on rush hour and traffic. This point is often missed in simulators. Therefore, for a solid overview, we used datasets with GPS coordinates of public transportation (bus) and taxis during routine daily work rather than the preset defaults in the simulator. We used three datasets, of which two are the mobility of taxis over some time based on passenger requests in the cities of San Francisco and Rome, and one is the mobility of buses in the city of Münster in Germany on a daily schedule. We chose these cities as they differ in urban structures ( Figure 1). San Francisco is categorized as a modern urban structure with stylish and structured streets. Rome is a city with an old texture that has expanded irregularly. Münster is a town with an irregular street structure. We aimed to obtain the mobility of taxis and buses in San Francisco, Rome, and Münster, respectively. While taxis' restrictions are less than those for buses, and some taxis drive on the streets without following all traffic regulations, buses in Münster have to follow specific routes, instructions, and driving regulations in the city. This reflects departing at a particular time from predetermined routes and stopping at specific points at a predefined time. With such structural differences in datasets, we expect changes in the mobility and density of the network nodes caused by the structure (city) and nature of the nodes (bus/taxi). We will describe the features and characteristics of each data set in the following subsections.

San Francisco Taxis
The first scenario is taxi mobility in San Francisco, USA [36]. This dataset contains GPS coordinates of 500 taxis during 30 days in the San Francisco Bay Area. The data were collected from Exploratorium-the science, art, and human perception museum, through the cab spotting project. Each taxi was equipped with a GPS receiver that sent the location of each taxi to a server. The time interval for sending data was less than 10 s (i.e., the status update of the taxi location). During the simulation studies, we used the GPS coordinates of 100 to 500 taxis and their timestamps for two days.

Rome Taxis
The second scenario is taxi mobility in Rome, Italy [37]. This dataset contains the GPS coordinates of 320 taxis during 30 days in Rome. This dataset was collected in February 2014. In the simulation studies, we used the GPS coordinate of 50 to 198 taxis and their timestamps for two days (1 February 2014 and 2 February 2014). We used the data of only 198 taxis because only 198 taxis were active within these two days.

Muenster Buses
To consider a scenario for public transportation, we created a novel dataset containing the route of buses following the schedule of public transportation in the city of Münster. We collected and converted data from Münster public utilities, live data website http: //api.busradar.conterra.de/demo/. We collected the data on 4 July 2021 and 5 July 2021. There were 149 buses operating, but not all were active simultaneously. For example, only a limited number of buses were active at night.

Routing Algorithms
The most prevalent routing algorithms in OppNets are flooding-based, predictionbased, and history-based algorithms. Therefore, to measure and compare the performance of the networks under various conditions, we have used three routing algorithms of Epidemic, Prophet, and PPHB++ which are flooding, prediction, and history-based, respectively. In addition, PPHB++ is flexible in changing the number of message copies in the network. The performance is addressed in terms of Message Delivery Probability (MDP), Dropped Message Probability (DMP), and Network Overhead (NetO). In the following, we explain these algorithms briefly.
Epidemic algorithm: is the most straightforward algorithm in which messages are broadcast to all available neighbors [18]. This process is repeated until the message reaches the destination or expires (end of its Time To Live (TTL)). The network suffers a high overhead in this algorithm, and there is no optimal routing algorithm.
Prophet algorithm: is the most well known prediction-based algorithm [38]. The contact history of nodes is used to calculate MDP. Then, nodes with higher MDP carry the messages. When nodes are within the communication range of each other, they update their predictability list. Furthermore, nodes that are often in the communication range of each other will receive higher MDP. PPHB++ algorithm: is based on Privacy-Preserving History-Based routing in the opportunistic networks (PPHB+) [39]. We have upgraded this algorithm to PPHB++ by restricting the Number of Message Copies (NumMC). In Prophet and Epidemic algorithms, when a node wants to send a message to a neighbor, it will send a message copy to the neighbor, and a copy remains in its buffer. While this is not the case in the PPHB++ algorithm, the nodes do not keep a copy for themselves.
Each node produces a polynomial in this algorithm, and the root of the polynomial is considered the node's nickname. When a node constantly detects a neighbor in its communication range, it multiplies its polynomial to the neighbor's polynomial and updates and delivers its new polynomial. When a node decides on carrying a message, it checks whether the message's receiver nickname is the root of its polynomial. Suppose the message's receiver nickname is the root of a node polynomial. In that case, it means that this node will most likely meet the receiver of the message, so it is a suitable candidate for carrying a message, and it can bring the message nearer to the destination.

Simulation Environment
We used the Opportunistic Network Environment (ONE) [40] to simulate and evaluate scenarios. Additionally, we used Matlab to calculate the outputs and depict the graphs and charts.
We imported the datasets into ONE. Each scenario continuance was 48 h, and the updating interval was 100 ms. Node mobility is according to the recorded GPS coordinate data in the datasets. The First-In, First-Out (FIFO) method is used for queuing models in buffers. The following parameters were configured and remained the same in all scenarios and simulations: • Transmission range: The maximum distance of forwarding a message from a node to a neighbor is 23 m. This range is based on experiments performed in [41].
We evaluated the performance of networks in the different scenarios, using the algorithms in terms of MDP, DMP, and NetO. We explicitly define these parameters in the ONE simulator as follows: • MDP: specifies the probability of delivering messages to the destination; Where: • DMP: specifies the probability of deleting messages in the nodes' buffer due to the buffer saturation or TTL messages; Where: DMP = Dropped messages Started messages + Created messages (2) Started messages are the number of message copies produced in the network. Nodes for forwarding a message to a neighbor in usual routing algorithms in ONE produce a copy of the message, forward it to the neighbor and retain a copy for themselves. Therefore, a considerable number of messages are started in the network. • NetO: is the ratio of passed relayed nodes subtracted by delivered messages to delivered messages; Where:

Optimized Parameters
In order to investigate the network performance, we performed a two-phase study: (i) We considered the San Francisco dataset due to the nodes' mobility and PPHB++ algorithm due to higher flexibility. We conducted the simulations to investigate the effect of different configurations of parameters on the network's performance by calculating the optimized parameters for this particular dataset and algorithm. (ii) During the second phase, we used the optimized parameters obtained from the first phase to calculate the network performance under the other datasets and algorithms discussed earlier. We have identified the five most influential parameters on network performance: The Number of Message Copies (NumMC), Number of Nodes (NumN), Buffer Size (BuffS), Messages Interval (MI), and Node Mobility (NM). The first four parameters are configurable, while the fifth one is correlated with the structure of the dataset, collected data, and network topology. We configured these five parameters to observe their impact on the network performance (Table 1). Due to simulation restrictions, we considered four NumMC (1, 5, 10, and 50), three NumN (100, 250, and 500), three BuffS (5, 10, and 15), and three MI intervals with appropriate configurations (see Table 1). The rest of the states of these parameters were restricted to 50, 500, 15, and 25 to 35 for NumMC, NumN, BuffS, and MI, respectively. We used Machine Learning (ML) techniques, including decision tree, multiple linear, polynomial, random forest, and support vector regression (Reg), to predict the optimal outputs ( Table 2). The Radial Basis Function (RBF) was used for the kernel of Support vector regression. We predict the optimized parameters for the extended versions of the configurations and network using ML. Furthermore, to mimic the real-world application and consider the restrictions, we have assumed that nodes in the network have limited resources, and the buffer size is 5 Mb.  We evaluated the network's performance using the regression score based on the coefficient of determination (R 2 ). We only considered those with a score of above 0.9 for no less than three influencing terms of the performance. We used two different algorithms to predict the optimized parameters for each regression algorithm.
Algorithm 1: In each regression model, (i) the first row is the result (R 1 = 1), (ii) in ith row of regression the following condition is checked: (iii) if the MDP i is greater than MDP in R1 and DMP i is less than the DMP in R 1 , and the NetO i is less than the NetO in R 1 , (iv) the R 1 changes to i (R 1 = i).

Algorithm 1 Regression model
Require: R, i Ensure: Alg1 result 1: Initialization : 2: Mechanism Initialization 3: Read the value R from row i 4: R 1 = 1 5: for i = 2 to the end of the array do 6: if MDP(i) > MDP(R 1 ) and DMP(i) < DMP(R 1 ) and NetO(i) < NetO(R 1 ) then 7: end if 9: end for 10: Alg1=NumMC(R 1 ),NumN(R 1 ),BuffS(R 1 ),MI(R 1 ) 11: return Alg1 In Algorithm 2, with the inverse MDP, as well as with DMP and NetO as the influencing terms of network performance, the minimum X should be delivered in order to obtain the maximum performance. In this equation, we considered a = 100, and b = 1/10. Depending on the application and the aim of the study, the user can change the coefficients.

Algorithm 2 Regression model
Input: X, i Output: Alg2 result 1: Initialization : 2: Mechanism Initialization 3: Read X from row i of the performance terms 4: Additionally, we considered the network with and without malicious nodes (Section 4.3.2). In order to compare the networks, we varied the NumN as: San Francisco: 100 to 500 nodes, Rome: 50 to 200 nodes, Münster: 50 to 150 nodes.
We set all networks based on the obtained optimized simulation results. Table 3 represents the effect of the aforementioned parameters on the network performance. We elaborate the impact of each parameter (NumMC, BuffS, MI, NumN, and NM) on network performance (MDP, DMP, NetO) as follows: Reducing the NumMC on the network has a limited and negligible effect on the MDP. The maximum value of MDP (75.59%) occurs when there is only one message on the network, and the minimum MDP value (72.03%) is delivered in the presence of ten message copies on the network. Although increasing the NumMC from one to ten causes a descending change in MDP, further increasing NumMC, does not necessarily follow this pattern. Removing the restrictions of NumMC delivers the NumMC value of 74.58%. The difference between the Min and Max NumMC (i.e., 1 < NumMc < unlimited) is only 1%.

Influencing Parameters and the Network Performance
Yet, NumMC causes a reduction in the DMP. The lowest value (1.13%) occurs when there are 50 messages on the network, and the highest value (2.23%) occurs when there are 10 message copies on the network. Although the DMP values are small, increasing the NumMC from 1 to 50 can reduce DMP up to 50% of the primary value, which is significant. As the number of delivered messages increases, the number of messages dropped in the buffer decreases.
We expected increasing NumMC impacts NetO, inversely. The results show a significant increase in NetO with greater NumMC. The Min NetO (9.905%) is reached when NumMC is one, and the Max NetO is delivered (23.7%) when there are 50 message copies on the network. The results did not change much after increasing the messages to more than 50 (no limit to the number of messages).
In summary, although NumMC has a minor effect on MDP and DMP, it can significantly enhance NetO. Furthermore, when there are more than 50 messages (No limited), the result does not change in this routing algorithm.

BuffS and Network Performance
Increasing the BuffS from 5 to 15 and observing its impact on the network parameters delivers 1% improvement in MDP, 16% reduction in DMP, and NetO worsening by 0.64%. Therefore, BuffS does not significantly affect the MDP, DMP, and NetO. Increasing the buffer volume can reduce the number of dropped messages because messages are deleted either when the buffer is full or when it expires. Considering the network size, further increasing the BuffS beyond a certain point cannot significantly change the network's performance (saturation).

MI and Network Performance
Increasing MI reduces the number of messages in the network. When the number of messages generated in the network decreases (greater MI), the messages are more likely to reach the destination. By changing MI from 5 to 15 to 25 to 35, MDP increases by 89%. Moreover, the probability of deleting messages dropped by 40%. Having more messages in the network leads to deleting them in the buffer earlier than sending them out to the other nodes. Increasing MI increases the NetO by 5%.
Therefore, we can conclude that MI mainly affects MDP and DMP, while there is not much change in NetO.

NumN and Network Performance
As the number of nodes increases from 100 to 500, the MDP rate increases by 4%, the DMP rate decreases by 46%, and the NetO increases by 67% as more nodes connect.
As expected, increasing the number of nodes in the network can outstandingly enhance DMP, have a negligible effect on MDP, and cause NetO to be worse.

NM and Network Performance
The last row of Table 3 displays the effect of NM on the network performance. In San Francisco, taxis have a wide range of mobility and travel in different directions. In Rome, taxis also have a wide range of mobility, but our data show a limited number of movements during the selected time.
In Münster, the range of buses is limited. They can only travel on restricted routes and stop at bus stops at specific times. Therefore, fewer nodes have the opportunity to meet each other and exchange data.
The MDP is higher because more nodes are connected in Table 3 in San Francisco. On the other hand, the amount of DMP in Rome and Münster is high due to the expiration of packets' TTL and the saturation of the nodes' buffer.
Regarding MDP and DMP, San Francisco has the best performance as a result of having many stimulus nodes, and Münster has the worst. The San Francisco network is better than Münster by around 9 times and 96% in MDP and DMP, respectively. NetO in Münster is less than San Francisco 12% due to fewer messages being exchanged in the network.
As a result, we can conclude that the node movement in the network can significantly impact MDP and DMP.
A closer look at Table 3 reveals that node mobility influences MDP and DMP changes. Considering the mobility of the nodes in the three datasets shows the noticeable changes under different conditions.
According to Equations (1) and (2), MDP and DMP are inversely related (not linear). Using the same analysis, the Münster dataset with the highest DMP and the San Francisco dataset with the lowest DMP behave as expected. NetO, unlike DMP and MDP, is influenced by several dominant parameters. NumMc, NumN, NM have the most significant impact on NetO, while BuffS and MI can be excluded from the factors affecting NetO. We can conclude that nodes and their related features, such as the number of messages of a node, the number of nodes, and the degree of mobility of the nodes, have the most significant impact on network performance. Similarly, in the real world, node types (taxis, buses, or pedestrians) and their characteristics (speed, route, and mobility) have the greatest influence on network performance. Table 4 shows the results of the regression score based on the coefficient of determination (R 2 ). Having scored one as the maximum, we omitted the Multiply linear regression from the list of qualifiers. Comparing Tables 4 and 5 indicates that the Decision Tree regression is the best performer. Comparing Algorithms 1 and 2 in the Decision Tree with a higher regression score delivers the optimized parameters: NumMC = 1, NumN = 100, BuffS = 10, MI = 15 to 25 s. We summarize the optimal results according to Algorithms 1 and 2 for different regression algorithms in Table 6.

Extending the Simulation Results Based on ML and Regression Models
Similar to the previous subsection, we compared the different regression algorithms, but for the network with limited resources. Comparing Table 4 and Table 6 indicates Decision Tree regression as the best performer. Comparing the Algorithms 1 and 2 in Decision Tree with the higher regression score delivers the optimized parameters: NumMC = 1, NumN = 100, BuffS = 5, MI = 55 to 65 s.
We have also simulated and calculated Decision Tree regression for Epidemic and Prophet algorithms based on these parameters. The results are illustrated in Figure 2. For PPHB++, Prophet, and Epidemic algorithms in Decision Tree regression, the average MDP is 80.2244%, 25.1400%, and 19.6533%, respectively; the average DMP is 1.52%, 99.9200%, and 100.1644%, respectively; and the average NetO is 12.8047, 712.6886, and 953.9167, respectively. As the number of nodes increases, the number of messages in the network grows and, as a result, raises the network overhead in Prophet and Epidemic algorithms. The least overhead by PPHB++ is addressed by its fundamental difference with two other algorithms to avoid copying the message while forwarding it to a neighbor.

Network Performance
We evaluated the network performance for three concrete datasets by the optimized parameters obtained from the previous section. We also compared the network performance under three algorithms of PHHB++, Epidemic, and Prophet during the simulations.

Network Performance without the Malicious Nodes
Figures 3-5 demonstrate the network's performance without malicious nodes. We assumed all nodes were trusted and forwarded messages without any sabotage. Figure 3 presents the MDP for all three datasets. The average MDP in different cities for PPHB++, Prophet, and Epidemic in San Francisco are 81%, 31.17%, and 23.70%, respectively; in Rome, they are 9.9900%, 8.6275%, and 6.0975%, respectively; in Münster, they are 7.19%, 8%, and 0.0635%, respectively.
In San Francisco and Rome, the output patterns are relatively similar; the highest delivery is by the PPHB++ algorithm, followed by the Prophet and Epidemic. PPHB++ improved MDP compared to Prophet and Epidemic 159.9337% and 241.9468%, respectively, in San Francisco; and 15.7925% and 63.8376% respectively in Rome. The higher MDP of PPHB++ in San Francisco and Rome is due to the larger number and greater flexibility of taxis. This causes the chance of meeting two taxis to exchange a message to increase. Furthermore, in the Prophet algorithm, the messages are sent to the neighbors with higher MDP, but in the Epidemic algorithm, messages are broadcast blindly, which reduces MDP.    In Münster, the best MDP is delivered by the Prophet algorithm, then the PPHB++ and Epidemic algorithms. The Prophet algorithm yields a better result than PPHB++ (11.2193%) and Epidemic (25.9318%) in terms of MDP. Furthermore, Figure 3 supports the results in Table 3; in San Francisco, the MDP is between 23% and 81% due to a large number of active taxis with an extensive range of movement; in Rome, it is lower and between 6% and 10% because the number of active taxis and the mobility are smaller; in Münster, this value is between 7% and 8% because the type of node is changed to bus. Consequently, the number of buses is smaller than the other two previous scenarios, and their movement range in the city is limited.
A limited number of buses driving on particular routes following local public transportation regulations restricts mobility and flexibility. In contrast, the other two cities with taxis are more expansive in quantity, the radius of movement, and flexibility. As a result, messages have a better chance of reaching the destination directly or through intermediate nodes in San Francisco and Rome rather than Münster. Figure 4 represents the DMP for San Francisco, Rome, and Münster datasets. The average DMP in different cities for PPHB++, Prophet, and Epidemic are 1.15%, 99.10%, and 99.27%, respectively, in San Francisco; they are 32.4975%, 94.5500%, and 94.7000%, respectively, in Rome; and they are 50.8733%, 95.0767%, and 95.4100%, respectively, in Münster.
The output patterns in all three cities are similar, with Epidemic and Prophet algorithms on top, giving the highest value of DMP, and PPHB++ with the least. Furthermore, Epidemic and Prophet behave the same way, giving an overlap output in Figure 4. Applying PPHB++ outperforms Prophet and Epidemic in DMP by 98.8396% and 98.8416%, respectively, in San Francisco; 65.6293% and 65.6837%, respectively, in Rome; and 46.4923% and 46.6792%, respectively, in Münster.
Under a high density of nodes, the PPHB++ algorithm delivers a better output, and with a low density of nodes, Prophet works better than the other two algorithms. The reason is the approach of forwarding the message from the source to a neighbor or destination in different algorithms. Having only one copy of a message in PPHB++ demands a higher density of the network, enabling message delivery. In contrast, in the Prophet algorithm, broadcasting the message to a neighbor or several is less restricted under certain conditions (higher probability of message delivery by a neighbor compared to the node itself). Therefore, MDP improves at the expense of NetO. Figure 5  Epidemic in all three datasets has the highest NetO, while PPHP++ is at the bottom and Prophet is in between these two. The low DMP and NetO in the PPHB++ algorithm are due to minimal NumMC. Epidemic broadcasts the messages to all neighboring nodes and generates a copy message each iteration. This significantly increases the NetO by saturating the network and using the resources. Although in Prophet, the broadcasting is limited to specific nodes that satisfy the message probability requirements, the number of message copies is still noticeable. PPHB++  (c) Münster  (c) Münster  (c) Münster   Figure 3. In San Francisco and Rome, the results are similar-the largest value is given by the PPHB++ algorithm, followed by the Prophet and Epidemic algorithms, respectively, for all cases with malicious nodes. In Münster, when 5% of nodes are malicious, the best performance of MDP belongs to Prophet, PPHB++, and Epidemic algorithms, from top to bottom. When 10% of nodes are malicious, this is rearranged to PPHB++, Prophet, and Epidemic algorithms. When the malicious nodes are 20%, the best performance in MDP is for PPHB++, Prophet, and Epidemic algorithms, respectively. Finally, when 50% of the nodes are malicious, Prophet, Epidemic, and PPHB++ algorithms work best, respectively.
When 10% and 20% of nodes are malicious, PPHB++ works more reliably than the other two. Whereas, in the case of 5% and 50% of nodes being malicious, the Prophet algorithm achieves a higher MDP. Figure 7 shows DMP for San Francisco, Rome, and Münster datasets, and the results support Figure 4. PPHB++, Epidemic, and Prophet algorithms have the lowest DMP, respectively, in all scenarios. Figure 8 displays the DMP for all three datasets. The lowest belongs to PPHB++, Prophet, and Epidemic algorithms in this figure. The reasons for these results are similar to what was mentioned in the previous section.

Restrictions of the Study
We investigated the effect of NumMC, BuffS, MI, NumN, and NM as the influencing parameters on OppNets performance. We also evaluated the effect of the presence of malicious nodes on the network performance. Despite performing an extensive simulation study using three datasets, our work is limited in the number of datasets (restricted by node density and mobility), the number of nodes (maximum 500 nodes), and the amount of collected data (two days).

Influencing Parameters on the Network Performance
Our study shows that reducing NumMC has little effect on receiving or deleting messages, but it significantly reduces NetO. Compared to other algorithms, this reduction in the number of copies of messages has a notable impact on reducing DMP, NetO and increasing the MDP. Increasing node's BuffS has little effect on MDP, DMP, and NetO.
Increasing the message generation interval, which produces fewer messages, can significantly reduce DMP, and increase MDP, but it has little effect on NetO.
Increasing the mobility of the nodes causes a notable increase in MDP and a significant reduction in DMP.
Increasing the network's NumN causes more nodes' interactions and enhances the MDP and DMP. It can also raise the NetO.
Therefore, as an output and application of our study in real world cases, we would suggest, for example, that in a VANET approach on a highway which is considered as a network with a low number of nodes and mobility where network overhead is not the case, and forwarding emergency and traffic messages has the priority, we should increase the message generation interval only to produce prioritized massages and increase the number of messages copy and buffer size in order to achieve the best network performance.
As the other scenario in real-world application, in which the network has a large number of nodes, low mobility, and a significant message generation, we can decrease the number of messages copied and prioritize the messages to increase message generation interval. Consequently, we can decrease the network overhead and improve message delivery in the network. An example of such a network might be an event such as a carnival in a city.

Machine Learning Techniques and Real Work Applications
By examining these parameters in various simulations and regression in ML (decision tree, multiple linear, polynomial, random forest, and support vector regressions), we found the optimized parameters for different networks. Utilizing three networks with different features represented by three datasets in the real world shows that the optimal results are obtained when NumMC is one, the NumN is 100, the BuffS is 10, and MI is 15 to 25 s. Since the buffer is one of the limited resources in each node and is considered a severe restriction in real-world applications, we also calculated the best result when the BuffS is 5. The optimized parameters under this conditions are obtained as BuffS = 5, the NumMC = 1, the number of nodes in the network = 100, and the MI = 55 to 65 s. We used these parameters to set the network. However, to improve the accuracy and efficiency of using the lower volume of memory, in particular, for the big data including a large number of inputs (i.e., if the number of datasets is significantly increased), where the training and testing speed and time, computational resources, classification, and prediction are crucial, the neural-like structure of non-iterative models such as the successive geometric transformations model (SGTM) can be used. These methods usually provide a lower error value for the regression task. Ito decomposition (Kolmogorov-Gabor polynomial) can be used in combination with SGTM to extend the inputs of the SGTM [42].
We evaluated the network performance in terms of MDP, DMP, NumN, NM, and NetO on three concrete datasets with PPHB++, Prophet, and Epidemic routing algorithms. The results show that PPHB++ has better MDP in San Francisco and Rome because of their free movement, and Prophet delivers a better MDP in the Münster dataset. PPHB++ is the best performer in terms of DMP and NetO, followed by the Prophet and Epidemic algorithms.
We also estimated the impact of malicious nodes on network performance. We studied three algorithms with the presence of different numbers of malicious nodes. Algorithm PPHB++ has the best performance in MDP, DMP, and NetO with malicious network nodes without restricting the nodes' movement. When there are constraints on node paths, the Prophet algorithm only works better in MDP.
In the San Francisco scenarios, where there are more nodes in the network, they acted slightly differently in the presence of malicious nodes.
According to the results, each network parameter can be set efficiently to have the best performance according to the network's conditions and needs and consider what percentage of network nodes may be malicious. Our comprehensive study shows that network performance influences parameters configured based on the applications and restrictions. Depending on the importance of the terms of performance in each application, the influencing parameters might be weighted to deliver the output. Using ML techniques for considering network behavior under various features, influencing parameters, and datasets representing a real-world application allowed us to extend the study by predicting different scenarios in a network significantly. The impact of DMP for OIP and OIPL is negligible in the Prophet and Epidemic algorithms in all datasets. This effect under PPHB++ has quite the inverse negative impact on San Francisco and Rome, and it is, on average, an increase of 20% in the Münster dataset.

Comparison of ONE Default Setting with Optimized Parameters
The results indicate an improvement of NetO in all datasets and algorithms; however, the improvement in the PPHB++ algorithm is greater (for example, 84% in the Rome dataset). Under PPHB++, NetO decreased by an average of 18% for OIP, but it also experienced an average of 15% for OIPL in San Francisco. It presents an average of 60% decrease in the San Francisco dataset. For the Prophet algorithm, NetO decreased by an average of 25% for OIP, but increased by an average of 21% for OIPL in Rome. NetO decreased by an average of 4% for OIP, but increased by an average of 36% for OIPL in Rome for the Epidemic algorithm. NetO is decreased by an average of 19% for Prophet and Epidemic algorithms in the Münster dataset. Analyzing the results shows that the presented solution influences MDP and NetO significantly in all three datasets and algorithms (there are some exceptions) and remains neutral for DMP.

Conclusions
In real-world applications, wearable devices, taxis, and buses might be the nodes of opportunistic networks to save-carry-forward a message from the source to the destination, particularly when facing a lack of infrastructure or sending an emergency alert from an injured involved in an accident to a hospital or rescue team. Albeit with various weights, NumMC, BuffS, MI, NumN, and NM influence the network's performance. To explore the real-world applications' restrictions and impact of resources on the network performance, we deployed three datasets that differed in features and structures to represent the characteristics of such networks. Extensive simulations and analyses are required to show the impact of each factor on the network's performance. We found that the nodes and the respective features are the most significant influences on the network performance represented by MDP, DMP, and NetO. This means that NumN, NM, and NumMC have greater weights than BuffS and MI. In the real world, this is mapped to the type of node (taxi or bus), its speed (NM), and the route it follows. All these factors are impacted by the route and regulation features determined by the dataset represented by the algorithm used in the simulation.
In a vast network, this requires specific configuration and tests. Thus, we used regressions techniques of machine learning to predict the optimized parameters for both networks with(out) resource restrictions.
We showed that obtaining the optimized parameters in each scenario rather than general configuration improves the network performance under different routine algorithms (i.e., PPHB++, Prophet, and Epidemic). Applying the optimized parameters to the network, MDP improved by an average of 17% in San Francisco and 32% in Rome and Münster (Prophet algorithm). In the same manner, MDP was enhanced by an average of 15% in San Francisco and 34% in Rome and Münster (Epidemic algorithm). In all datasets, the impact of DMP is negligible for the Prophet and Epidemic algorithms. Still, it has an inverse negative impact on PPHB++ in San Francisco and Rome and increases by 20% in the Münster dataset. NetO was improved across all datasets and algorithms, although the PPHB++ algorithm gained the most.
A dense network may improve the probability of delivering a message, but increase the NetO. We concluded that public transportation could be used in OppNets depending on the purpose of the application. However, care must be taken to choose the appropriate means (wearables/taxis/bus) depending on the purpose and structure of the environment (city). Our study showed that buses with lower NetO are appropriate in OppNets if the message is time-tolerant.
The results indicated that using taxis in modern texture cities with a higher network overhead (e.g., several taxis in the same area and direction), greater speed (NM), and fewer restrictions on driving are suitable.
The datasets' features restrict our work in terms of data acquisition (two days) and node characterizations (limited by node density and mobility and 500 nodes). For future work, we plan to consider pedestrians included in our dataset. Therefore, we plan to implement an OppNet on several wearable devices deployed in real situations with trueto-life scenarios. In addition, we would like to compare these results with the simulation outcomes and improve the ML algorithms in terms of performance, speed, and time to train.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
The detail of the results illustrated in Figures 9-11 are provided in Tables A1-A3, respectively, in the followings. The MDP and DMP parameters are normalized in these tables between 0 and 1.