NPLP: An Improved Routing-Forwarding Strategy Utilizing Node Proﬁle and Location Prediction for Opportunistic Networks

: Opportunistic networks are considered as the promising network structures to implement traditional and typical infrastructure-based communication by enabling smart mobile devices in the networks to contact with each other within a ﬁxed communication area. Because of the intermittent and unstable connections between sources and destinations, message routing and forwarding in opportunistic networks have become challenging and troublesome problems recently. In this paper, to improve the data dissemination environment, we propose an improved routing-forwarding strategy utilizing node proﬁle and location prediction for opportunistic networks, which mainly includes three continuous phases: the collecting and updating of routing state information, community detection and optimization and node location prediction. Each mobile node in the networks is able to establish a network routing matrix after the entire process of information collecting and updating. Due to the concentrated population in urban areas and relatively few people in remote areas, the distribution of location prediction roughly presents a type of symmetry in opportunistic networks. Afterwards, the community optimization and location prediction mechanisms could be regarded as an signiﬁcant foundation for data dissemination in the networks. Ultimately, experimental results demonstrate that the proposed algorithm could slightly enhance the delivery ratio and substantially degrade the network overhead and end-to-end delay as compared with the other four routing strategies.


Introduction
In recent years we have witnessed the tremendous achievement of the mobile communication, driven by the rapid proliferation of smart wireless devices [1], such as Wi-Fi technologies, laptops, smart phones, wearable smartwatches, tablet computers, which also promotes the emergence of a promising network model that is known as opportunistic networks (ONs) [2,3]. As the enablers of 5G networks and big data environment, opportunistic networks [4] have been generally regarded as a new type of networking topology to improve traditional communication infrastructure by enabling smart mobile nodes to contact with each other when they are in the same communication area instead of the cellular network [5]. In ONs, without a fixed communication infrastructure, mobile nodes try to seize the opportunity of encountering others to implement the wireless and opportunistic communication process due to the unstable and discontinuous connectivity between them [6]. Nowadays, opportunistic networks have a significant application in multitudinous areas such as vehicular networks [7], space communication [8], disaster relief [9], wildlife surveillance [10], Internet of Things (IOT) [11], communication in underdeveloped areas [12] and sensor networks [13].
includes three continuous phases: the process of information collecting and updating, community detection and optimization and the state prediction of mobile nodes. During the phase of information collecting and updating, each node in the networks shares its state queue with the peers it encounters and constantly updates its own information queue and more importantly, constructing an identical and uniform network routing state matrix. Besides, we develop a community optimization algorithm based on node activeness and node attributes, which reasonably narrows the coverage of the node community and enables the mobile nodes in the same community to be more closely connected. Based on the improved Markov chain [30], the location state prediction strategy is adopted to ensure that data packets are carried and routed by the relay nodes with a higher probability of moving to where the destinations are more likely to arrive in the future. Eventually, data packets are transmitted to those nodes with higher underlying capacity of encountering the destinations in the same community at the next time stamp. The main contributions of this article are summarized as follows: • This paper investigates the issue of collecting and updating network routing state information via the cooperation between multiple nodes. Considering both the social attributes and periodic mobility trajectory of nodes in a certain phase, the well-organized structure of network routing state matrix is established.

•
Through the definitions of node activeness and node profile, the inner structure of the community could be reconstructed more orderly and simply, in which mobile nodes possess relatively high social similarity among them, indicating that those nodes communicate with each other more frequently and closely. Consequently, plenty of opportunities for data dissemination may be generated in the refactored community.

•
With the support of the improved Markov chain, we propose the state prediction method to predict the probability of each node in the networks moving to the next location state at the next time stamp and, more importantly, evaluating the mobility relationship between neighbor nodes and destinations. This also could be considered as a significant basis for message routing and forwarding in opportunistic networks.
The remainder of this paper is organized as follows. The state of the art associated with routing algorithms in opportunistic networks will be introduced in Section 2. We propose and develop the structure of NPLP strategy for message routing and forwarding in Section 3. Experimental process and performance analysis will be described in Section 4. Ultimately, this study is concluded in Section 5.

Related Works
In the light of the characteristics of routing algorithms in ONs [2], these strategies could be roughly divided into two categories: community-aware and community-ignorant. On the one hand, by obtaining and comparing the influence, centrality, similarity, social attributes within a region, mobile trajectory or other evaluation metrics of nodes [4], the main purpose of community-aware routing approaches is to discover, concentrate and optimize community structure and connectivity among nodes in the same community. On the other hand, the community-ignorant routing approach, a type of message routing-forwarding scheme that does not embody the definition of community and tries to employ the spacial and temporal information associated with mobile awareness of nodes in the networks to make an efficient decision on data transmission [27]. Next, we are going to introduce the state of the art related to two kinds of routing methods in ONs.

The Existing Community-Aware Routing Approaches in Opportunistic Networks
For constructing an orderly and stable data connectivity between source and destination, Liu et al. [8] propose predict and forward strategy, a novel routing-delivery scheme using node profile for opportunistic networks, in which mobile nodes could be characterized by assessing their social attributes such as occupations, interests, working places or places of residence. The complete data connectivity is obtained by the optimal stopping theory. Additionally, Liu et al. [17] recommend other routing-forwarding algorithm that is known as FCNS (A fuzzy routing-forwarding algorithm exploiting comprehensive node similarity), where fuzzy inference logic is utilized to evaluate the transmission preference of the node and message carriers select a suitable relay node from its neighbors via the feedback mechanism [31]. Li et al. [20] present a social-aware node mobile trajectory prediction algorithm for opportunistic networks, in which a 1-order Markov model [30] is adopted to update the mobile location of the nodes in the networks and the accuracy of the state prediction is enhanced by the state transition matrix from Markov model. Mingjun et al. propose a community-aware routing protocol based on home-aware community model in opportunistic networks [21]. This algorithm aims at establishing a type of network that only consists of community homes, in which the minimum expected end-to-end delay among nodes is calculated by the Dijkstra algorithm, thereby realizing better opportunistic routing performance in the networks. Yoneki et al. [22] introduce a number of visualizations to describe attributes of node mobility trajectory that contains community discovery, detection, optimization and concentrating. This study is committed to enhance the connections among mobile nodes in opportunistic networks. Sharma et al. [23] present GD-CAR (A Genetic Algorithm Based Dynamic Context Aware Routing Protocol) routing algorithm, an advanced context based touting protocol that utilizes the genetic search strategy to predict the mobile trajectory of nodes by collecting and updating the context information associated with mobile nodes in the networks. Finally, Xiaoming et al. [25] propose a novel two-phase routing strategy based on the degree of social activity of mobile nodes, which establishes a mathematics method for designing a multi-copy spreading strategy and a single-copy forwarding strategy based on social activeness of mobile nodes and the history contacting times between nodes.

The Existing Community-Ignorant Routing Approaches in Opportunistic Networks
Halikul et al. [3] recommend a hybrid routing algorithm that is called as EpSoc, which attempts to assess the degree centrality of nodes and to adopt the information blocking strategy to reduce message duplicates in opportunistic networks. Furthermore, Saha et al. [5] propose a novel annealing-based routing algorithm (SeeR), which rigorously considers the number of hops from source to destination. Because mobile nodes do not track data information of other nodes in the networks, the deadlock problem of local optimum could be effectively alleviated in SeeR. Additionally, Zhou et al. [9] present an efficient message routing-forwarding scheme (TCCB) for ONs, which mainly focuses on employing the temporal relationships among nodes to predict the future social contacting model in the effective time of data dissemination.
Arastouie et al. [12] recommend a risk evaluation scheme, which tries to choose the optimal data connectivity from source to destination by assessing the short or long-term influence of each message routing decision in opportunistic networks. This strategy reasonably adopts cache management and network routing capacity to improve the data transmission environment in the networks. Wu et al. [24] investigate the issue of cache management and node cooperation in opportunistic social networks and introduce a node recognition approach (ICMT). The main goal of this algorithm is to adjust the priority of message routing-forwarding among nodes and synchronously implement an efficient data transmission process through node cooperation. Eventually, Nunes et [28] present a social-aware routing algorithm that do not pay attention to the importance of the spacial and temporal data information of human mobility. This scheme aims at reducing the average device-to-device delay and network overhead in opportunistic social networks via spacial and social attributes of nodes. Testing in a real-world data set [32], authors find that the end-to-end cost of message delivering from this algorithm outperforms those from several traditional and typical routing strategies in opportunistic networks.

System Model Design
In opportunistic networks, it is difficult for message carriers or the source node to accurately acquire the location of the destination due to the random movement of the mobile devices. However, node profile may consist of multiple strong social attributes, which can be utilized to optimize the community division of nodes. Additionally, because of the periodicity and repeatability of node movement, many routing algorithms start to focus on the trajectory prediction of nodes. However, there is no effective method to combine these two strategies in opportunistic networks. Therefore, we present an improved node trajectory prediction method based on location prediction and community optimization for opportunistic networks.

Collecting and Updating Effective Information about the Routing State of the Networks
In our proposed algorithm, there is an information updating phase where two nodes exchange their social attributes and mobile trajectory information at each encounter. After this special phase, message carriers or the source node are able to make an accurate decision on messages routing or forwarding based on the obtained information, which indicates that the NPLP algorithm is one that periodically updates information and forwards messages. The length of information updating phase T is usually set based on the cycle of user activity, in which message carriers or the source node can obtain the important social attributes and movement trajectories of other nodes in the networks.
To quantify the entire process of information collecting and updating in this period, we define L node i and S node i as the mobile trajectory queue and social attribute queue of the node i in the networks, respectively, and these two information queues could be formalized as where t i 1 , t i 2 , t i 3 ,t i 4 · · · · · · t i m represent m different time stamps of the node i and l i 1 , l i 2 , l i 3 · · · · · · l i m denote the locations of the node i at m different time stamps. Consequently, (t i 1 , l i 1 ), (t i 2 , l i 2 ) · · · (t i m , l i m ) record the mobile trajectory of the node i during the length of information updating phase T. Additionally, A i 1 , A i 2 , A i 3 , A i 4 · · · · · · A i n are n different significant social attributes of the node i, such as interest, working place, residence, occupation, and so forth, which may play a special role in community optimization and data transmission. In order to consider the mobile trajectory queue and social attribute queue comprehensively, we define the information queue of the node i as List node i = {L node i , S node i , List i encounter } List i encounter = {List node a , List node b , List node c · · · · · · List node z } (2) in which List node i represents the information queue of the node i and List i encounter is the the set of information queues of the nodes encountered by the node i and the nodes a, b, c· · · · · · z are the set of nodes encountered by the node i.
As mentioned above, each node in the networks employs the mobile trajectory queue and social attribute queue L node and S node to build a information queue List node and then each node will share its own information queue with the nodes it encounters. After the entire process of information collecting and updating, each node stores a non-redundant and uniform network routing state matrix RS, which could be strictly defined as RS = {List node a , List node b , List node c · · · · · · List node z } To be specific (as shown in Figure 1), at the time t 1 , the nodes i, j and y are in the same communication domain and they share their own information queue List with each other. Moreover, each node i, j or y in the networks constantly collects and updates List node by adding the obtained information queue to its own encountering queue List encounter . Furthermore, the nodes i, j and y are able to construct an identical and uniform routing state matrix as RS = {List node i , List node j , List node z }, which reflects the mobile and social attribute information of these nodes during the time t 1 and they will respectively store the routing state matrix RS in their own cache spaces. After that, at time t 2 , as the nodes in the networks move randomly, the nodes i, x and z are in the same communication domain, they continue to collect other's information queue and update its own information queue and finally the network routing state matrix RS in their cache spaces could be denoted as RS = {List node i , List node j , node y , node x , List node z }, which effectively represents the current routing sate of the networks. After the whole process of information queue collecting and updating, each node in the networks is able to establish a network state routing matrix RS and then processed to the phase of message routing and forwarding. When the phase of message routing and forwarding is over, each node in the networks deletes the state routing matrix RS from its own cache spaces and then enters the next phase of information collecting and updating.

Markov Chain State Prediction Model Based on the Movement Trajectory of Nodes
In the real application scenarios of opportunistic networks, the daily behaviors of mobile nodes are highly repetitive, which means that mobile nodes may visit several fixed places repeatedly in a relatively fixed time and carry out some repetitive activities every day, such as driving from residence to workplace. According to the mobile trajectory of nodes and fixed behavior patterns, the proposed algorithm is able to model the scene based on the mobile trajectory and employs the improved Markov chain to predict the next mobile state of nodes in the networks, thereby constructing a reliable data transmission link from source to destination.
Firstly, referring to Equation (1), we assume that there are m frequently visited locations in the current application scenario of opportunistic networks and l i m is the m-th state of the node i in the Markov prediction process, which could be denote as S(m). Then, the transition state matrix P for each node in the networks could be calculated utilizing the network routing state matrix RS collected in the phase of information updating. Specifically, we assume that h ij is the number of times that the observation node i moves from the position x to the position y, then the probability that the node i starts from the position x to reach the position y could be computed by in which h is the sum of the times that the node i accesses each position starting from the position x. Therefore, if there are m locations in the application scenarios of opportunistic networks, then an m * m transition probability matrix TP l m for the node i could be determined, which can be represented as In addition, according to the Markov prediction model of absolute distribution, P t 0 m represents the probability that the node i is in the current state S(m) at the initial time t 0 and the probability that the node i is in the other state S(m + 1) at the next time stamp t 1 could be denoted as P t 1 m+1 . In this way the initial distribution of Markov prediction chain P(l) can be denoted as Moreover, through comparing the transition probability matrix TP l m with the distribution of Markov prediction chain at the previous time, the prediction distribution of Markov chain at the next time P(l + 1) in the networks can be computed by where P t 2 1 , P t 2 2 , P t 2 3 , · · · , P t 2 m represent the probabilities that the node i will be in the potential position states S(1), S(2), S(3), · · · , S(m) at the next time t 2 . To express the probability that the node i reaches the next potential position state more concisely, we reasonably define P(l + 1) as in which argmax{P t+1 m } indicates that the maximum value of the elements in the Markov chain P(l + 1) is the most likely position state of the node i in the next time. For example, if the initial position state of the node i is S(0) and the initial distribution of Markov prediction chain for the node i is P(l), then the prediction distribution of Markov chain P(l + 1) at the next time t + 1 can be determined. If P t+1 m is maximum value of the state sequence P(l + 1), then the node i is most likely to reach the position state S(m) at the next time t 2 .
The main purpose of predicting the next state position of nodes in opportunistic networks is to understand the general rules of node movement and simultaneously evaluate the mobile relationship between neighbor nodes and destination. Consequently, message carriers or the source node are able to make an more efficient and accurate decision on message routing and forwarding based on the mobile characteristics of their neighbor nodes.
As shown in Figure 2, the application scenarios of opportunistic networks consists of 5 different position states S(0), S(1), S(2), S(3) and S(4). Additionally, the node i is in the position state S(0) at the current moment t 1 and S(1), S(2), S(3), S(4) are the potential next position sates of the node i at the next time t 2 . Moreover, P(l) represents the prediction distribution of Markov chain at the current moment, hence the prediction distribution of Markov chain for the node i at the next time t + 2 could be computed by P(l + 1) = (P l+1 1 , P l+1 2 , P l+1 3 , P l+1 4 ). Consequently, the final result of Markov's prediction for the node i is the maximum value of these four probabilities P l+1 1 , P l+1 2 , P l+1 3 , P l+1 4 , which indicate that the node i is more likely to arrive at the position state P l+1 m with the maximum probability in the next sequence t 2 .

Community Detection and Optimization Strategy Based on Node Activeness and Node Profile
After the whole process of information collecting and updating, each mobile node in opportunistic networks has obtained a unified network routing state matrix RS, which contains the social attribute information S node of nodes in the networks. Actually, the set of these social attributes of the node could be called 'node profile' in opportunistic networks, which is also regarded as a strong basis for community detection and division in the networks. In many scenarios, the definition of a community is not measured by geography but rather by multiple social characteristics of mobile nodes such as personal interests, affiliation, occupation and so on. Additionally, the number of mobile nodes in a dynamic community are always changing. Over a period of time, there will always be some members of a community who are more stable in the community (or more involved in the community), while others will leave the community more quickly (or less involved in the community). In general, nodes belonging to the same community will be more closely connected and communicate with each other more frequently, which provides a good opportunity for messages routing and forwarding.
The community in dynamic opportunistic networks is a group with multiple common social characteristics that changes over time. Thereout, if c 1 represents the state of the community in time slice t 1 , then the state of a community could be denoted as a binary group < t 1 , c 1 >. Then the state sequence C of a community from the time slice t 1 to the time slice t v can be represented as in which < t 1 , t 2 , · · · , t v > is a disjoint time slice sequence for a period of time. Besides, the network states at different time slices may show different importance on the data transmission process and node activity cannot be a simple sum of the individual situation on each time slice, meaning that we need to consider the weight of each specific time slice in the networks. Therefore, the node activeness N A(U i , C) of the node i usually could be computed by where W t represents the regulatory factor of the time slice t and T (U i ,C) and T (C) denote the time slice set containing the node i in community C and the total time slice sets in community C, respectively. Furthermore, the node i strictly belongs to the community C in the networks, which can be formalized as U i ∈ C. The regulatory factor of the time slice W t could be determined based on specific characteristics of the community or specific research objectives.
One of the most important behaviors of mobile nodes in a community of opportunistic networks is to interact and connect with others. For example, in online social networking sites, an old member who posts frequently online and participates in community activities is often more important in the community than a new member who rarely posts, which can effectively reflect the characteristics of mobile nodes in the community. As a result, mobile nodes in the community are divided into g grades according to their activeness in the community and we assume that the degree of node activeness g is higher than the level g + 1. According to different levels of node activity, mobile nodes in the community could be divided into g classes. Combined with Equation (1), the social attributes of the node i in the community C can be defined as where T is the length of the phase of information collecting and updating, which is also known as the set of time slices in community C and A U i 1 , A U i 2 , · · · , A U i n represent different attribute characteristics of the node i in the time period T. Consequently, (U i , C) could be categorized into different classes based on node activeness.
In opportunistic networks, node profile reflects different social attributes of mobile node, which also has become an important foundation for community detection, division and optimization in the networks. Therefore, comprehensively combining node activeness with node profile, we rigorously define the social similarity Sim C (U i , U j ) between the nodes i and j in the community C as in which A U i A U j and A U i A U j represent the number of the elements in the intersection and union sets between the social attributes of the nodes i and j and w 0 , w 1 , w 2 , w 3 , · · · , w n denote the parameter value of different weights, which could be determined based on the structure of a particular community in the networks. Because there are several overlapping communities in opportunistic networks and one node may exists in multiple communities, the calculation of node similarity Sim C (U i , U j ) should be standardized. In general, no matter what community the node in, it should be the most similar to itself. Therefore, we compare the similarity between the two nodes with the average of all the nodes in the community with their own similarity, which can be mathematically reduced to In overlapping communities, if the node i and the node j simultaneously exist in multiple communities, then the social similarity between them in the whole networks is the average of their social similarity in each community, which can be expressed as where the nodes i and j synchronously belong to c 1 , c 2 , c 3 , · · · , c v . Moreover, {c 1 , c 2 , c 3 , · · · , c v } ∈ ONs, in which ONs represents the whole network environment.
In the proposed algorithm, the node activeness the node activeness N A(U i , C) could be computed by Equation (10), while the social similarity of mobile nodes in the networks can be obtained from Equations (11)- (13). Then, comprehensively combining node activeness with node profile, we define the joint score JS C v of community optimization as where U 1 , U 2 , · · · , U e represent e different mobile nodes in the networks and U is the total number of mobile nodes in the community C. Additionally, N A and Sim denote the node activeness of nodes in the community C and social similarity of mobile nodes in the networks, respectively. Through comparing the node's joint score JS C v in each community of the networks, the community with the highest score will be recommended to this node eventually.

Message Routing and Forwarding Method Based on Markov Chain and Community Optimization
In the application scenarios of opportunistic networks, a mobile node usually maintains strong social relations with a few individuals. In the proposed strategy, we have detected, divided and optimized the community to which the nodes belong. The connections between nodes within the community are relatively tight, while that between different communities are relatively sparse. According to the social characteristics of mobile nodes, the probability of mobile nodes moving to a community is relatively large and the retention time is relatively long. In contrast, the probability of a node moving to a non-community location is lower and the duration time is shorter.
Therefore, according to the gregarious characteristics of social nodes in opportunistic networks, it is necessary to predict the position of a certain node at the next moment. If a number of nodes that contain multiple relatively close social relationships with the destination assemble at a certain location at the current moment, the probability of the destination moving to this location at the next moment is relatively large. For this reason, the proposed scheme creatively employs node activeness and node profile to detect community and adopts the improved Markov chain to predict the next location sate of mobile nodes, thereby making a more reasonable decision on message routing and forwarding in the data transmission process.
After the phase of information collecting and updating, mobile nodes have obtained effective information about the routing state of the entire network, which mainly consists of the information about social attributes and mobile trajectory of these nodes in the networks. Based on the network routing state matrix RS, we assume that the contacting records CR (node i ,node j ) of the nodes i and j within the length of information collecting and updating T are expressed as where cn represents the number of communications between the nodes i and j and t b a is the time when the a contact started and t f a is the time when the a contact ended. Then, the contact probability between the nodes i and j can be expressed as According to the above Equation (17), the social relationship matrix between mobile nodes in the networks could be obtained. Next, we need to calculate the probability of the node arriving at the next location state in the next moment. Given that the node i is in community C 1 at the current moment, we define the set of all nodes in the location state S(0) in this community as NS p = {node 1 , node 2 , · · · node i }, in which NS ∈ C 1 . Then, with the precondition of the node i at the location S(0) in the current time, the probability of it reaching the location S(1) in next moment could be defined as where NP s 1 (NS p ) represents the probability of the node NS p remaining at its current position S(0), which can be calculated by Markov model in the Section 3.2. Furthermore, NP s 1 (node i , NS p ) is the probability that the nodes node i and NS p encountering at the location state S 1 and it could be calculated by (19) in which N s k (node i , NS p ) is the number of encounters between the nodes node i and NS p at the location sate S k . If we assume that the relationship weight value W(node i , NS p ) = RWV p , then the probability that the node i reaches the location state s k at the next moment could be computed by (20) in which λ k (k = 1, 2, 3, · · · , m) is the weight of each conditional probability, which could be obtained by normalizing the weight of relationship between the node i and others nodes in the community C 1 .
Consequently, according to the location state distribution of mobile nodes in the community C 1 , the probability of the node i arriving at different location states could be determined. Combined with the results of location state prediction from Markov model, the probability distribution of the node i reaching all location states could be obtained by utilizing the above Equation (20). After the phase of information collecting and updating, through comparing the probability of neighbor nodes arriving at each different location state at the next moment and determining the change of community optimization, message carriers or the source node are able to make the most reasonable decision on message routing and forwarding in the data dissemination process.
As shown in Figure 3, the community detection strategy based on node activeness and node profile constantly optimizes the community in opportunistic networks, which reflected in that the number of nodes in the community is decreasing and they are becoming more and more connected. In the green rectangle, the purplish blue dot represents the source node and the red dots are relay nodes, while the green one is the destination and the black one denotes irrelevant node in the networks. Moreover, each black dotted circle represents a community in the networks. At the time t 1 , the source node node source , relay nodes node 1 and node 2 are divided into the same community based on community detection and optimization strategy. On the basis of the results of Markov location prediction, the relay nodes node 1 and node 2 possess the highest probability of moving to the same location state as the destination node node destination , hence the source node node source sends messages to relay nodes node 1 and node 2 at the time t 1 . As nodes move and communities change, the relay nodes node 1 and node 2 are more likely to delivery messages to the destination node destination at the time t 2 .

Complexity Analysis of the Proposed Algorithm
In conclusion, this paper proposed an improved routing-forwarding algorithm utilizing node profile and location prediction for opportunistic networks, which is aiming to construct an efficient and reliable data dissemination connectivity between sources and destinations with the support of community optimization and location prediction strategies. To demonstrate the complexity and performance of the NPLP scheme, we construct a pseudo-code table (see Algorithm 1) to describe the various steps of the proposed algorithm in detail. To be specific, in the phase of information collecting and updating, each mobile node transmits its information queue to the other one it encounters and update the network state matrix via the acquired information queue, hence the time complexity of this stage is O(log 2 n). Additionally, according to community detection and optimization strategy, message carriers or the source node analyze which relay nodes are more likely to be divided into the same social community with the destination, therefore the time complexity of this process is O(n). Moreover, in the process of predicting the location of the node's next state, message carriers and the source node calculate the transition probability of their neighbors and analyze the probability of encountering with the destination, so the time complexity of this process is O(n). Above all, the time complexity of the NPLP algorithm is O(log 2 n + n + n) = O(n) through rigorous mathematical analysis.

Algorithm 1 An Improved NPLP Routing-Forwarding Algorithm
Input: L node i , S node i , community C, source node a , relay node node i and destination node j Output: JS C v and NP S k (node i ) 1: Begin 2: Collecting and updating effective information about nodes in phase T; 3: Constructing network routing matrix RS = {List node a , List node i , · · · , List node j }; 4: for (i = 1; i ≤ n; i + +) do 5: if (node i ∈ community C) then 6: Calculating the joint score JS C v ; 7: Determining the probability of node i moving to the next location NP S k (node i );

8:
Node a sends messages to node i and then to node j ; 9: else 10: Computing the node activeness N A(U i , C);

11:
Obtaining the social similarity Sim C (U i , u j );

12:
Calculating the joint score JS C v ; 13: Community detection and optimization in the networks based on JS C v ; 14: Determining the probability of node i moving to the next location NP S k (node i );

15:
Node a sends messages to node i and then to node j ; 16: end if 17: end for 18: Output NP S k (node i ) and JS C v ; 19: End

Simulation and Analysis
In our experiment, the simulator Opportunistic Network Environment (ONE) is used to evaluate the performance of the NPLP algorithm in terms of delivery ratio, network overhead and end-to-end delay. In addition, we employ the mathematical tool MATLAB to implement the process of Markov chain construction and simultaneously realize the community optimization strategy. The setting of experimental parameters in Opportunistic Network Environment (ONE) could be determined based on the obtained results from MATLAB. Moreover, to comprehensively evaluate the improvement and enhancement from the NPLP algorithm, it will be compared with other four routing methods in opportunistic networks-Spray and wait [32], Epidemic [3], ICMT (Information cache management and data transmission algorithm) [24], and FCNS (a fuzzy routing-forwarding algorithm exploiting comprehensive node similarity) [17], where Spray and wait and Epidemic are two traditional and typical algorithms and ICMT and FCNS are the latest routing strategies in opportunistic networks.

Setting of Experimental Parameters
In our experiment, we construct the Table 1 to describe the four real data sets Infocom 5, Infocom 6, Cambridge and Intel utilized in this experiment, which could be downloaded from CRAWDAD. Besides, to enable the NPLP algorithm to present the relatively excellent experimental performance in Opportunistic Network Environment (ONE), we carefully construct the Table 2 that exhibits the general settings of parameters in this simulation. Specifically, due to the applicability and extensibility in the research field of community division and routing algorithm, the four real data sets Infocom 5, Infocom 6, Cambridge and Intel could be adopted in our simulation environment. Additionally, the communication area is set to 300 × 3000 m 2 and the total simulation time is 10-20 h for each experimental process. Furthermore, the movement speed and the total energy for a node in the networks are initialed to 1-15 m/s and 200 J, respectively. Message carriers or the source node may consume 0.5 J energy after each successful data transmission between them. The time to live for a message in the networks is set to 1, 1, 20 and 10 h in the data sets Infocom 5, Infocom 6, Cambridge and Intel, respectively. Moreover, the number of mobile nodes N in the communication area is set as 50 (initial value), 100, 150 and 200, while the cache space of a node in the networks is set as 10 (initial value), 15,20,25,30,35 and 40 MB. Next, according to the obtained data set Infocom 5, Infocom 6, Cambridge and Intel, we randomly define 6 different social attribute values for each node class in the Opportunistic Network Environment (ONE), which provides a dependable basis for community detection and optimization in the data transmission process. After several adjustments manual of weight parameter in Equation (12), the NPLP algorithm is able to present the best performance in the Opportunistic Network Environment (ONE) when w 1 = 0.25, w 2 = 0.05, w 3 = 0.1, w 4 = 0.1, w 5 = 0.15 and w 6 = 0.35. To sum up, the process of community division and weight adjustment could be basically realized in MATLAB and the testing and training results from the MATLAB will be imported into the Opportunistic Network Environment (ONE). Ultimately, Markov state prediction model could be implemented in the simulator, which calculates the probability of the node moving to the next location state and implements the data transmission process.

Experimental Evaluation Metrics
In this experiment, the simulation results are derived from the average of the testing results of each algorithm in the four real data sets Infocom 5, Infocom 6, Cambridge and Intel, and we will comprehensively assess the performance of the NPLP algorithm in terms of the following evaluation metrics: (1) Average delivery ratio (DR): this parameter denotes the probability of message carriers or the source node sending messages to their neighbors successfully. It could be expressed as DR node = DR rec /DR sen , in which DR Rec represents the number of messages acquired from the neighbors in the communication area and D sen represents the number of messages forwarded by message carriers or the source node.
(2) Average end-to-end delay (AD): this metric denotes the average delay of the data transmission process from sources to destinations. It could be computed by AD node = AD sum /m, where AD sum represents the total delay in the networks and m denotes the number of successful data transmission from the source node to destination.
(3) Average network overhead (NO): this parameter reflects the average overhead of a successful message routing or forwarding between a pair of mobile nodes in the communication area, which is also defined as the average value of the time delays, energy consumption and storage overhead in a successful data transmission process.

Analysis of Experimental Results
In the simulation, the experimental results mainly focus on the influence of the cache spaces of a node and the number of nodes on the performance of the five routing algorithms in opportunistic networks. Consequently, we constantly change these two variables to observe the average fluctuation of delivery ratio, end-to-end delay and network overhead from the five algorithms tested in the four real data sets.
Firstly, we compare the delivery ratio of each routing algorithm with various number of nodes and various cache spaces of a node. As demonstrated in Figure 4, the NPLP and Epidemic methods show a delivery ratio infinitely superior to those from the other three algorithms FCNS, ICMT and Spray and wait. Furthermore, the delivery ratio from ICMT is always in the middle level among these methods, while those from FCNS and Spray and wait are always been at a relatively low horizontal. Specifically, when the number of mobile nodes N = 200 and the cache spaces of a node C = 40 Mb (Figure 4d), the delivery ratio from NPLP almost reaches 0.88; when N = 100 and C = 10 Mb (Figure 4b), the delivery ratio from NPLP is at the minimum of 0.64. Overall, the average delivery ratio from NPLP is about 0.75, which increases by nearly 50% as compared with that from Spray and wait.  As far as the delivery ratio is concerned, NPLP and Epidemic are the two best algorithms among these five proposals. This is because the traditional Spray and wait routing algorithm employs flooding mode to broadcast message duplicates widely in the networks, hence its delivery ratio will be relatively high. Additionally, through the reasonable detection and division of node community and the location prediction of node mobility, the NPLP algorithm is able to effectively enhance the probability of relay nodes encountering the destination in the communication area. Moreover, ICMT and FCNS employ cache management and fuzzy inference mechanism to effectively evaluate the social relationship between relay nodes and destinations, thereby improving the efficiency of message routing and forwarding. However, relay nodes may take a long time to wait for the destination in the second phase of Spray and wait, which definitely causes a relatively low delivery ratio in the networks. Figure 5 demonstrates the fluctuation of the average end-to-end delay from each routing algorithm with various number of nodes and various cache spaces of the node. In general, with the increasing of the cache spaces and number of nodes, the average end-to-end delay of each algorithm goes up continuously and slowly. It is worth noting that the average end-to-end delay from Epidemic is always the highest among those from the five algorithms and those from ICMT and Spray and wait are at the middle level all the time, while the average end-to-end delay from NPLP and FCNS are both inferior to those from the other three algorithms. Specifically, when N = 50 and C = 10 Mb (Figure 5a), the average end-to-end delay from NPLP presents the minimum value of approximately 65, while it reaches the maximum value of 450 when N = 200 and C = 40 Mb (Figure 5d). Besides, when N = 100 and C = 20 or 25 Mb (Figure 5b), the average end-to-end delay from NPLP surpasses that from FCNS twice. Furthermore, the total average end-to-end delay from Epidemic is as high as about 320, which is almost 2.5 times that from NPLP (about 120). Additionally, the total average end-to-end delay from Spray and wait, ICMT and FCNS is approximately 162, 150 and 135, respectively.
In the light of the assessment on each algorithm's characteristics, Epidemic and Spray and wait are used to broadcast a large number of message duplicates to the networks and the buffer memory space and processing capacity of the node are very limited, hence the flooding mode adopted in the two algorithms often leads to a relatively high average end-to-end delay in the network environment. Besides, with the application of the mechanisms of node cooperation and cache management, the ICMT algorithm is able to maximally and rationally utilize the buffer memory space of each node and, more importantly, tremendously improving the efficiency of the data dissemination and simultaneously decreasing the average end-to-end delay. FCNS is a traditional fuzzy inference algorithm that makes a reasonable data transmission decision by comparing the social and mobile similarities between nodes, enhancing the success rate of messages routing and forwarding. Furthermore, NPLP could predict the probability of relay node reaching the community to which the destination node belongs at the next moment, evaluates the mobility relationship between relay nodes and destinations and decreases the hop number of messages routing in the networks, thereby reducing the average end-to-end delay significantly.
Ultimately, a comprehensive comparison of the network overhead between the five algorithms with various number of mobile nodes and various cache spaces of the node is demonstrated in Figure 6. In general, when the number of nodes ranges from 50 to 200 and the cache space of a node increases from 10 to 40Mb, NPLP always exhibits the lowest network overhead among those from these algorithms, with the lowest and average values of 50 and 150 respectively. While the network overhead from Spray and wait is the largest among those from these algorithms all the time, indicating that the highest and average values reach nearly 680 and 450 respectively. Besides, the network overhead from epidemic, ICMT and FCNS are in the middle grade among those from the five algorithms. As we can see from Figure 6d, the fluctuating range of the network overhead from ICMT is the smallest (from 100 to 400), meaning that this strategy is relatively stable and reliable. The Spray and wait method shows the largest fluctuation range (from 250 to 680) among these five algorithms. This is because ICMT is committed to store data information and delivery messages through cache cooperation between multiple nodes, which significantly saves the cache space of nodes and reduces their energy consumption. A mass of message duplicates in Epidemic and Spray and wait may take up the cache spaces of the node unduly, therefore the nodes carrying excessive data are more likely to consume more energy in the process of random movement, which may lead to a relatively high network overhead when the number of nodes increases from 50 to 200. Moreover, NPLP is able to effectively control the number of message duplicates in the networks through community optimization and the prediction of node mobility. It also effectively decreases the number of message hops from sources to destinations in the networks, thereby reducing the energy consumption of nodes and simultaneously saving their memory space.

Conclusions
In this study, with the aim of establishing an efficient and reliable data transmission connectivity between sources and destinations, we proposed an improved routing-forwarding algorithm utilizing node profile and location prediction for opportunistic networks, which mainly consists of three phases: the phase of information collecting and updating, community detection and optimization and the prediction of the node's next state. Each mobile node of the networks will obtain a network routing matrix after the process of information collecting and updating. After that, community optimization and location prediction strategy is regarded as an important basis for message routing and forwarding in the data dissemination process. Overall, the NPLP method creatively employs the scheme of information exchange between mobile nodes to obtain the global routing state of the network and innovatively combines community division with location prediction to make a reasonably decision on data dissemination. Ultimately, experiments show that as compared with the other four algorithms, the proposed algorithm is able to improve the delivery ratio slightly and reduce the network overhead and end-to-end delay substantially.
In future work, we are committed to considering the issues of privacy protection and information security in the process of information exchange and data dissemination, and strive to develop a secure routing protocol or defense framework for the proposed algorithm.
Author Contributions: B.C. and L.C. conceived the idea of the paper. B.C. and L.C. designed and performed the experiments; B.C. and L.C. analyzed the data; B.C. contributed reagents/materials/analysis tools; B.C. wrote and revised the paper.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare that they have no competing interests.