1. Introduction
The Opportunistic Network (OppNet) is a kind of ad-hoc network [
1] that can complete information forwarding in the intermittently connected network environment, through the contact opportunity brought by the mutual movement between nodes. The OppNet is originated from the mobile ad-hoc network (MANET) [
2] and the delay tolerant network (DTN) [
3]. The communication of OppNet is different from the traditional wireless network. The most different point is that the nodes in the OppNet are not uniformly deployed, and they don’t need a completely connected transmission path between the sender and the receiver [
4].
Because there is no complete end-to-end transmission path in the OppNet, the “storage-carrying-forwarding” routing mechanism is adopted for information transmission [
5]. This method does not require the node to maintain the routing table to other nodes in the network. It is to cache information on a mobile node with storage capacity, and to find the right next hop nodes for information transmission with the help of the encounter opportunity brought by the movement of the node.
Figure 1 is a schematic diagram of the process of information transmission in an OppNet. Assume that the source node S needs to transmit information to the destination node D. Since node S and node D are not in the same connected domain at time t1, there is no complete communication link between them, so node S sends the message to the neighbor node C. Because node C does not have an appropriate next-hop for information transmission, it carries the message and waits for a suitable forwarding opportunity. With the opportunity of node mobility, node C and node B move to the same connected domain at time t2, and node C forwards the information to node B. Then, node B meets the destination node D at time t3, and forwards the message to node D to complete the information transmission.
In OppNet, there is no reliable communication link between source and destination nodes because the network connection is intermittent, the duration of network connection is short, and the network topology is constantly changing [
6,
7,
8]. Due to the restrictions of various factors such as application characteristics, environment, and cost, OppNet can exactly meet the needs of certain specific applications [
9]. Nowadays, the typical applications of the OppNet include the field data collection [
10], the message communication in remote districts [
11], the vehicle network [
12], the ad hoc network under various environment [
13], and so on.
Because the mobility of nodes in OppNet leads to the dynamic changing trend of the network topology, the traditional routing algorithms are difficult to apply to the OppNet. Therefore, how to select suitable neighbor nodes for information transmission in the topology has become one of the research hotspots in the field of the OppNet. At present, the research on the routing algorithm of the OppNet is mainly divided into two types, one kind is the proactive routing protocols [
14], and the other kind is the reactive routing protocols [
15]. However, due to the mobility of nodes in the OppNet and the intermittency of wireless connection, there is still the problems of low delivered ratio and unreasonable selection of next-hop nodes.
In the practical application scene, nodes in the network are usually people in a certain social relationship, and their mobility patterns and interactions are social [
16]. In the OppNet with people as the carrier of the network terminal, human behavior characteristics and social attributes are different from other mobile network models [
17], which play a decisive role in the design of routing algorithms. According to the small-world characteristics and agglomeration of human mobile behavior, the routing protocol based on the influence of nodes and community characteristics is proposed for message forwarding [
18]. The regularity and social nature of human mobility can predict the future encounters of nodes to help select the best next hop nodes.
By analyzing the problems of some traditional routing algorithms of OppNet, combining the small-world effect and the relationship between the nodes, this paper considers how to choose the right neighbor node as the next hop from the perspective of social attributes between nodes. Practice has proved that the cosine similarity algorithm can accurately calculate the similarity between two texts [
19]. In this paper, the cosSim routing algorithm based on cosine similarity is proposed to measure the strength of the social relations between nodes by calculating the cosine similarity between data packets of nodes, so as to select the next-hop nodes.
The paper is organized as follows:
Section 2 describes the related work.
Section 3 describes the proposed routing algorithm in detail. Experimentation and results are shown in
Section 4 and, finally, in
Section 5 some conclusions are drawn.
2. Related Work
Due to the dynamic change of the network topology caused by the node’s movement, the traditional routing protocols of wireless network are difficult to apply to the OppNet. Therefore, how to reasonably select the next hop to implement information forwarding effectively has become an important research direction of the OppNet [
20].
The Epidemic algorithm [
21] is based on the flooding strategy. The main idea is that nodes meet each other to exchange information which is missing from each other. The routing algorithm can enhance the delivered ratio of the network and reduce the transmission delay effectively, but it will generate a great deal of copies of messages in OppNet, which will easily lead to excessive routing overhead in the network.
Based on the traditional Epidemic algorithm, Zhang F et al. [
22] proposed an Epidemic routing algorithm with adaptive capabilities. By observing the buffer status of the surrounding nodes, the algorithm adjusts the number of message copies injected by the node into the network in real time, which improves the performance of the Epidemic algorithm and effectively reduces the routing overhead.
A Spray and Wait algorithm is proposed in the literature [
23]. The algorithm is divided into two stages: the Spray stage and the Wait stage. In the Spray stage, the sender forwards the message carried by itself to the network in a certain amount, and enters the Wait stage if it does not send the message to the receiver. During the Wait stage, all nodes carrying the message transmit the information to the receiving end through the Direct Delivery strategy. The main advantage of the Spray and Wait algorithm is that the routing overhead is significantly smaller than the Epidemic algorithm, and it has better scalability and can be well adapted to OppNets of various sizes.
The literature [
24] proposed a routing algorithm based on the historical information (HBPR), which divides the network into different regional units and numbers them according to geographical locations. Each node uses GPS to record its own location information and count the most frequently visited areas. The source node transmits the message to a relay node that is geographically closer to the destination node, and continuously shortens the distance between the node carrying the message and the destination node, and achieves the effect of meeting the destination node and transmitting the message to it.
In the literature [
25], the SRBet routing algorithm is proposed. First, the algorithm uses the temporal evolution graph model to accurately capture the dynamic topology structure of the OppNet. Based on the model, the algorithm introduces the social relationship metric for detecting the quality of human social relationship from contact history records. Utilizing this metric, the algorithm proposes social relationship based betweenness centrality metric to identify influential nodes to ensure messages forwarded by the nodes with stronger social relationship and higher likelihood of contacting other nodes.
The literature [
26] proposed a protocol-social aware networking (SANE), it’s the first forwarding strategy that combines the advantages of both social-aware and stateless approaches in OppNet (also called pocket switched network, PSN) routing. Although cosine distance is used in the literature [
26] and this article, there are obvious differences. The SANE protocol sets an interest space for each node in the network, and represents the interest space as an m-dimensional vector. And the cosine distance is used to calculate the similarity between nodes’ interest spaces. In this paper, we express the data packets of nodes by means of vectors, and the cosine distance is used to calculate the similarity between nodes’ data packets.
Fabbri F et al. [
27] proposed a new sociable routing that selects a subset of optimal forwarders among all the nodes and relies on them for an efficient delivery. The important point is to assign a time-varying scalar parameter to each node in the network, which captures its social behavior in terms of frequency and types of encounters. Simulation results show that compared with other known protocols, the sociable routing achieves a good compromise in terms of delay performance and amount of generated traffic.
3. An Efficient Forwarding Strategy Based on Cosine Similarity between Nodes
For the problems in the process of forwarding information in the OppNet, this paper analyzes the relationships among nodes in the network and proposes an efficient forwarding strategy, cosSim, which is based on the cosine similarity of data packets between nodes. First, the algorithm calculates the cosine similarity of the data packets between nodes to define the relationship between nodes. Then according to the degree of similarity between the nodes to select the next hop for information forwarding.
3.1. Building the Opportunistic Network Topology
Assume that the selected sub-network topology is as shown in
Figure 2. There are 12 nodes in total. The set of nodes is
. All nodes are relay nodes, all of which have the characteristics of mobility and the ability to carry and forward information. In current period, it is assumed that node
needs to send information as the source node, and the speed of the message transmission between nodes is far greater than the speed of node movement. When the message is transmitted in sub-network, the sub-network topology will not change essentially.
In the OppNet, neighbor nodes of each node can transmit information, and each neighbor node may become the next-hop. Compared with the dynamically changing network topology, in the OppNet with people as the mobile carrier, the social relationships between nodes are relatively stable, it does not change with change of the network topology. The sociality of nodes are embodied in the data packets they carry. Therefore, it is necessary to select some nodes with relatively stable social relations among numerous neighbor nodes for information transmission.
This paper uses the cosine similarity to calculate the similarity of the data packets carried by nodes. The similarity of data packets between nodes is equal to the similarity between nodes, so as to define the strength of social relationships between nodes.
3.2. Node Definition and Similarity Calculation
Definition 1. The data packet set represents the j data packets carried by any node i. The data packets carried by the node i are expressed as a vector , where is the weight of the j-th data packet in the set , that is the frequency of the data packet appearing in the node i, and the initial value is 1.
Definition 2. The merge operation of the data packet sets of any two nodes a and b is denoted as , and the respective data packet vectors are recalculated on the basis of the set .
Assume that node a has a total of n data packets. Its set is
The data packets of node a can be expressed as vector
:
Assume that node b has a total of m data packets. Its set is
The data packets of node b can be expressed as vector
:
The merge operation of the data packet sets of node a and b is recorded as
The data packets vector of node a corresponds to the data packet set after merging is
:
is calculated as follows:
The data packets vector of node b corresponds to the data packet set after merging is
:
is calculated as follows:
Definition 3. The node similarity indicates the degree of similarity between nodes a and b.
The cosine similarity between vector
and vector
is as follows:
Among them, and represent the i-th element in vectors and respectively.
By calculating the cosine of the angle between two vectors, the similarity between two vectors can be determined. Therefore, the similarity of two nodes, nodes
a and
b, can be calculated by combined data packet vectors
and
, and its calculation formula is
Definition 4. Access control number K, which is used to control the number of the current node to access its neighbor nodes. When the current node has a large number of neighbor nodes, it may take a long time to attempt to access all the neighbor nodes, which is not conducive to the transmission of information in the network.
Therefore, when the number of neighbors of a node is larger than the access control number K, the neighbor nodes are sorted according to the transmission distance with the current node, and the first K neighbor nodes with closer transmission distance are preferentially accessed. The access control number K effectively reduces the computation time of the cosSim algorithm.
Definition 5. The lower threshold of node similarity (), which is the screening criteria for candidate nodes in the next hop. When the degree of similarity between the current node and its neighbor node is greater than the lower threshold , the neighbor node is taken as a candidate node for the next hop.
If the similarity between the current node and all its neighbor nodes is less than that of the lower threshold , the neighbor node with the greatest similarity will be selected for information transmission.
Definition 6. The upper threshold of node similarity (). In the OppNet with people as the mobile carrier, if the similarity between the current node and one of its neighbor nodes is greater than the upper threshold , it shows that the social properties of the two nodes are very similar. It is possible that their movement trajectories and the nodes they can reach are not much different. Therefore, it is not necessary to use this neighbor node as a candidate node for the next hop.
If the degree of similarity between the current node and all its neighbor nodes is greater than the upper threshold , the neighbor node with the smallest similarity is selected for information transmission.
If the similarity between the current node and some neighbor nodes are less than the lower threshold , and the similarity with rest of the neighbor nodes is greater than the upper threshold , Then, the neighbor node with the greatest similarity is selected as the next hop in all neighbor nodes whose similarity is less than the lower threshold . In all neighbor nodes whose similarity is greater than the upper threshold , the neighbor node with the smallest similarity is selected as the next hop node for information transmission.
3.3. The Traversal Process of Node
Each node in the sub-network topology shown in
Figure 2 maintains a buffer. The buffer stores the data packets that need to be forwarded by the node, and the each data packet has a globally unique identifier. Assume that the data packets of each node in the sub-network topology is shown in
Table 1.
The node traversal process based on
Figure 2 is as follows. First, initializes a directed tree T, which takes the node
currently sending information as the root node. According to the transmission distance from node
, the first K neighbor nodes are inserted into the tree in turn as the children of node
, as shown in
Figure 3.
The nodes , , , , are sequentially accessed to calculate their similarity to the current node . The similarity calculation process of nodes and is as follows:
Merge the data packet sets of nodes
and
:
Recalculate the data packet vectors of nodes
and
:
Calculate the similarity between nodes
and
:
According to the above calculation process of the nodes and , the similarities between the nodes and , , , are calculated. The results are as follows: , , , .
Assume that both and are less than the lower threshold , is greater than the upper threshold ; and are located between the lower threshold and the upper threshold.
Therefore, the nodes , , and are deleted from the tree T, leaving nodes and as the next hop for node . The neighbor nodes of nodes and are prioritized to insert the top K nodes into the tree T according to the transmission distance, as the children of the corresponding nodes.
At this point, the structure of the tree T is shown in
Figure 4.
The neighbor nodes , and of the node , and the neighbor nodes , and of the node are sequentially accessed to calculate the degree of similarity between the corresponding nodes. The results are as follows: , , , and , , .
Assume that , and are all smaller than the lower threshold .
In this case, the node with the greatest similarity to the node is retained, and the remaining nodes and are deleted from the tree T.
Assume that and are both greater than the upper threshold , and is smaller than the lower threshold .
Then node with the highest degree of similarity is retained in all nodes below the lower threshold. The node with the smallest similarity is retained in all nodes larger than the upper threshold, and the remaining node is deleted from the tree T.
At this point, the sub-network topology has been traversed and the final directed tree T is obtained, as shown in
Figure 5, which is the transmission path graph of the sub-network.
3.4. Algorithm Design
According to the traversal process of nodes in the sub-network topological structure in the previous section, the routing algorithm based on cosine similarity of data packets between nodes is deduced. The execution process of the algorithm is as follows:
Step 1: Initialize a directed tree T. Each node maintains a set of data packets and a data packet vector . The node S currently transmitting information is taken as the root node of the tree T.
Step 2: Create a set of neighbor nodes for the current node S, denoted as .
If there is a parent node P of the current node in the tree T, create a set of neighbor nodes for the node P, denoted as , and perform the following operation: .
Step 3: Determine whether the number of nodes in the collection is greater than K. If the number of nodes in the set is greater than the access control number K, the nodes in the set are sorted from near to far according to the transmission distance from the current node, and the top K nodes are inserted into the tree T in turn as children of the current node S.
Step 4: Calculate the similarity between the current node s and each child node in turn.
First, merge the sets of data packets between node s and child node j, denoted as . According to the Equation (6), Equation (7) and the set of data packets after merging, the data packet vectors and of node s and node j are recalculated.
Then, based on the similarity calculation Equation (11), the similarity between nodes s and j is calculated, which is denoted as .
Step 5: Determine the similarity between the current node and each neighbor node.
All nodes with the value of similarity between the lower threshold and the upper threshold are selected as the next hop, and the remaining nodes are deleted from the tree T.
If the similarity with all the neighbor nodes is less than the lower threshold, that is , the neighbor node with the greatest similarity is selected as the next hop, and the remaining child nodes are deleted from the tree T.
If , the neighbor node with the smallest similarity is selected as the next hop, and the remaining child nodes are deleted from the tree T.
If . Then, among all the neighbor nodes whose similarity is less than the lower threshold , the neighbor node with the greatest similarity is selected as the next hop. And in all neighbor nodes whose similarity is greater than the upper threshold , the neighbor node with the smallest similarity is selected as the next hop, and the remaining child nodes are deleted from the tree T.
Step 6: All children of the current node in the tree T are successively regarded as the current node, and steps 2, 3, 4, 5, 6 are repeated until the sub-network topology is accessed.
Step 7: According to the above process, a directed tree T can be finally obtained. In the light of the structure of the tree T, we can get one or more transmission paths, and the node that is currently sending information is forwarded on these paths through a replication forwarding strategy.
The cosSim routing algorithm is shown in Algorithm 1.
Algorithm 1. Opportunistic Network Routing Algorithm Based on Cosine Similarity of Data Packets between Nodes. |
1. Input: A graph G(V, E), a source S, Dz: Data packet aggregation of node z, Cz: Data packet 2. Output: A or more paths 3. Init: InitTree(T), CurrentNode(i,s); /*set the node s as the current node i*/ 4. Set: T.setRootNode(i); Ui; /*A set of neighbor nodes of node i*/ 5. If(Node p=T.getParentNode(i)) then 6. Up; Ui=Ui-(Ui∩Up); 7. SortbyDistance(Us); Us.Delete(K); /*Sorting neighbor nodes in Ui according to the transmission distance, 8. and delete the neighbor nodes after K*/ 9. While(! Empty(Ui)) do 10. For(Neighbor j : Ui) do 11. Dij=DiDj; 12. C’i; C’j; /*Merge the data packets of the node i and node j, Recalculation of data packet 13. weight vectors of nodes*/ 14. SIMij=cos(C’i,C’j); /*calculate the cosine similarity between node i and node j*/ 15. End for; 16. If (all(SIMij)< or all(SIMij)>) then 17. selectNode(max(SIMij) or min(SIMij)); T.setChildNode(i,j); 18. Else If (onePart(SIMij)< and theRest(SIMij)>) then 19. selectNode(max(onePart(SIMij)) and min(theRest(SIMij))); T.setChildNode(i,j); 20. Else SelectNode(SIMij); T.setChildNode(i,j); /*select the similarity between r and R*/ 21. End if 22. For(ChildNode k : T.getChildNode(i)) do 23. setCurrentNode(i, k); continue; 24. end for; 25. Node p=T.getParentNode(i); Up; Ui= Ui- Up; Sort(Ui); Ui.Delete(K); 26. End while; 27. getPath(T); /*Gets the transmission paths based on the tree T*/ 28. Return T |
4. Experiments and Results
In view of the cosSim routing algorithm proposed in the above section, this paper will use the OppNet simulation platform ONE to compare it with the traditional routing algorithms—the Spray and Wait (S&W) algorithm and the Epidemic algorithm. This paper will use the three indicators of delivery ratio, delivery delay, and routing overhead to compare the above three algorithms.
The simulation scenario settings are shown in
Table 2.
The algorithm parameters as shown in
Table 3.
Through multiple simulation experiments, the upper and lower thresholds are selected based on the delivery ratio and routing overhead as reference values. The overall performance of the cosSim algorithm is relatively good when the lower threshold
= 0.29 and the upper threshold
= 0.81. As shown in (a) of
Figure 6, when the lower threshold
is equal to 0.29, the delivery ratio remains at a relatively high level, while the routing overhead of the network shows a significant decline when the lower threshold is greater than 0.29. As shown in (b) of
Figure 6, when the upper threshold
is greater than 0.81, the growth trend of the delivery ratio has been very slow, while the routing overhead is increasing, so the upper threshold is determined to be 0.81. When the access control number K is 6 or 7, the comparison times between nodes are not significantly increased, and the execution time of the algorithm can be effectively reduced. The following are analysis and comparison of the cosSim algorithm, S&W algorithm, and Epidemic algorithm in different situations.
Figure 7 shows the delivery ratio of three routing algorithms with different node caches. According to
Figure 7, the size of the node cache has the most significant effect on the delivery ratio of the cosSim algorithm. In the case where the node cache is small, the delivery ratio of the three routing algorithms is extremely low. With the increase of node caches, the delivery ratio of the three algorithms show different degrees of growth. The S&W and Epidemic algorithms show a slow growth trend. When the node cache is 50 MB, the delivery ratio of the S&W algorithm and the Epidemic algorithm are maintained at about 60% and 50%, respectively. The growth rate of delivery ratio of the cosSim algorithm is relatively fast. When the node cache is 25 MB, the delivery ratio of the cosSim algorithm has reached the highest level of the S&W algorithm and the Epidemic algorithm.
Figure 8 shows the effect of the number of nodes on the delivery ratio of the three algorithms, and the overall trend is similar to the node cache. The difference is that the delivery ratio of the Epidemic algorithm is always higher than the S&W algorithm when the node cache is getting larger. When the number of nodes is increasing, the delivery ratio of the S&W algorithm is always higher than the Epidemic algorithm. This shows that the Epidemic algorithm is more dependent on the node’s cache size, while the S&W algorithm is more dependent on the number of nodes in the network. When the number of nodes in the network reaches 600, the delivery ratio of the Epidemic algorithm and the S&W algorithm is 45% and 55% respectively, and the delivery ratio of the cosSim algorithm is as high as 80%.
Figure 9 shows the performance of the three algorithms in terms of delivery delay under different number of nodes. The delivery delay of the cosSim algorithm is affected minimally by the change of the number of nodes, and the S&W algorithm and the Epidemic algorithm show the same growth trend. When the number of nodes is 600, the delivery delay of Epidemic algorithm and S&W algorithm is as high as 6500 and 5500 respectively. The delivery delay of the cosSim algorithm is slow. After the number of nodes is greater than 400, the delivery delay of the cosSim algorithm is maintained at around 3000, which is less than 2500 for the S&W algorithm and less than half for the Epidemic algorithm.
Figure 10 shows the effect of the number of nodes on the routing overhead of the three algorithms. As the number of nodes increases, the routing overhead of the three algorithms presents different growth trends. Among them, the Epidemic algorithm has the fastest growth rate of routing overhead, which is about twice the S&W algorithm and about three times that of the cosSim algorithm. When the number of nodes in the network is as high as 600, the routing overhead of the S&W algorithm is around 2500, that of the Epidemic algorithm is around 4500, and the routing overhead of the cosSim algorithm is the lowest around 1500. Because the Epidemic algorithm is based on the flooding strategy, it will generate a large number of copies in the network, which is likely to cause too much routing overhead.
The above simulation results show that compared with the traditional routing algorithms, S&W algorithm and Epidemic algorithm, the cosSim algorithm is more suitable for the data transmission of the OppNet, which can effectively improve the delivered ratio, and reduce the delivery delay and the routing overhead.