A Transmission Prediction Neighbor Mechanism Based on a Mixed Probability Model in an Opportunistic Complex Social Network

: The amount of data has skyrocketed in Fifth-generation (5G) networks. How to select an appropriate node to transmit information is important when we analyze complex data in 5G communication. We could sophisticate decision-making methods for more convenient data transmission, and opportunistic complex social networks play an increasingly important role. Users can adopt it for information sharing and data transmission. However, the encountering of nodes in mobile opportunistic network is random. The latest probabilistic routing method may not consider the social and cooperative nature of nodes, and could not be well applied to the large data transmission problem of social networks. Thus, we quantify the social and cooperative relationships symmetrically between the mobile devices themselves and the nodes, and then propose a routing algorithm based on an improved probability model to predict the probability of encounters between nodes (PEBN). Since our algorithm comprehensively considers the social relationship and cooperation relationship between nodes, the prediction result of the target node can also be given without encountering information. The neighbor nodes with higher probability are ﬁltered by the prediction result. In the experiment, we set the node’s selﬁshness randomly. The simulation results show that compared with other state-of-art transmission models, our algorithm has signiﬁcantly improved the message delivery rate, hop count, and overhead.


Introduction
With the mobile cellular network evolving to the fifth generation, huge amounts of information data are generated every day. Opportunistic complex social networks have become a common data delivery platform [1][2][3], and the popularity of mobile devices has enabled a variety of new services in social networks to be realized [4]. Many social tools, such as Google Plus [5], Facebook [6], and Twitter [7], have a large number of users and generate data in each moment. The traditional social network approaches to dealing with transmission and reception of big data diversification faces challenges due to the diversification of online data. Many wireless devices used to deliver information and then faced data overload, which would be a hindrance to information interworking and information sharing. In order to cope with the data transmission in 5G wireless networks, we need to design a model and more convenient transmission mode to implement data forwarding in a flexible manner, which is more suitable for complex network environments.

1.
Through the transmission of decision information assisted by base stations or edge devices, we propose a method of pre-processing collected device information in an opportunistic complex social network and calculating social relationship values and cooperation probability values, so that the new probability matrix can be trained by adding additional information.

2.
We propose a hybrid probability matrix decomposition model to predict the probability of encountering a node. We add node encounter information, social relationships, and partnerships to form a hybrid model to predict the encounter cooperation (EC) values between nodes, and filter the key encounter nodes through EC values. 3.
We have designed a simpler way to transfer information to share the large number of transmission tasks of the central equipment. The mobile device only needs to request the central node to encounter the probability table of the other node and the destination node. Then, according to the information in the table, the data is passed to the neighbor node that has a higher value than its own and the destination node. The simulation results show that the PNEC algorithm has excellent effects in the message transmission process between nodes, and maximizes the characteristics of mobile nodes in the network.

Related Work
In the form of 5G networks, base station construction is relatively dense. The base station needs to collect the soaring data generated by devices, and analyze, process and transmit the data further. As shown in [20], a practical system architecture is proposed to process and extract useful information, and through the data caching mechanism, to solve the user data requirements. This approach increases the complexity of data processing. In such cases, most of the work of data transmission and data sharing can be done by the mobile device. The node's transmission decision, calculation, and other tasks can be assigned to the base station or edge equipment to reduce the node energy consumption. Many researchers have proposed ways for mobile devices to deliver messages in opportunistic mobile complex social networks. In such a network, there is no guarantee that the full connection path between the source and destination exists at any time, which makes the traditional routing protocol unable to deliver messages better between nodes. Some researchers have proposed probabilistic routing protocols for the network [19]. The nodes predict the reachability probability between the nodes by storing their encounter information, and select the nodes with higher encounter probability as the relay node. Such a method will increase the energy consumption of nodes in the case of the heavy task of 5G network data transmission.
In mobile social networks, some researchers have proposed using community structure to make data forwarding decisions. They assume that people in the same community would have closer relationships and more opportunities to connect with each other. The concept of the probability of the source node to the destination node is proposed by tracking the encounter situations of the nodes to study the activity of the node and the probability of reaching the target node, using the Poisson model of the social network contact. On this basis, the contact-aware opportunity-based forwarding (CAOF) scheme [21] is proposed by calculating the global and local probability of two different transmission phases. It includes the forwarding scheme between the inter-community transmission phase and the intra-community transmission phase. In the inter-community phase, neighbor nodes with higher global activity and high probability of source node to target node are selected as relays. Furthermore, forwarding decisions are determined by local metrics in the internal phase of the community. In this program, social characteristics and cooperation relationships may not be considered for integration into the design of the scheme, and may not perform well when the nodes are selfish.
In addition, there is an interesting study on mobile social networks. Since the mobile social network consists of nodes with different social attributes, the connection is transmitted or shared by chance. Some researchers have proposed a routing algorithm based on social identity awareness (SIaOR) [22]. They believe that many socially aware routing algorithms ignore an important social attribute, social identity. Therefore, researchers propose an opportunity routing algorithm in mobile social networks by considering the social identity and social impact of mobile nodes in mobile social networks. The algorithm not only considers the multiple social identities of mobile nodes, but also their social impacts. However, it is difficult to measure the ability of nodes to forward data if they Symmetry 2018, 10, 600 4 of 21 simply consider the social identity impact with the target node. Moreover, nodes with strong social relationships may not be suitable nodes for cooperation.
A new sensing method is the Internet of Things in the Mobile Crowd Sensor Network (MCSN). There are two existing transmission mechanisms: One is to transmit data through a cellular network, the other is an opportunistic transmission method through short-range wireless communication technology. Some researchers have proposed that the cost of cellular network transmission is high and it is not conducive to increasing user participation. Therefore, researchers focus on the application of wireless communication technology in MCSN by constructing a new opportunistic propagation model. They proposed an opportunistic data transmission mechanism based on the Socialization Susceptible Infected Susceptible Epidemic Model (SSIS) [23]. The mechanism uses the SSIS model to obtain a social relationship table by analyzing the social relationships of mobile nodes. The source node that performs the sensing task through the combination of the social relationship table and the spray and wait mechanism selectively propagates the data to other nodes until it reaches the destination node that can send the data to the platform. Using the spray and wait method will make the transmission mode more complicated and the node energy consumption increase.
Some researchers have proposed a new mobile opportunistic network routing protocol, MLProph [24], through machine learning and through further research on the mobile social network. The model uses decision trees and neural networks to train various factors, such as the predicted value inherited from the PROPHET routing scheme, node popularity, node's power consumption, speed, and location, to further calculate the probability of successful delivery of information. The algorithm trains based on historical information to obtain an equation for calculating the probability to detect whether the node can send the message to the destination node. This is an interesting way to make probabilistic predictions through machine learning methods, however, this method may only consider the nodes themselves, without considering the characteristics between the nodes.
According to the discussion on those methods, there is no complicated decision-making scheme considering the cooperative relationship and social relationship of the nodes. Additionally, the key neighbors are calculated and decided by nodes. When the amount of data transmission is large, calculation and decision by nodes may increase energy consumption. We need more complicated decision-making methods to solve data transmission problems more conveniently when the amount of data skyrockets in 5G networks. A good and effective decision-making mode determines whether the information transmission between mobile devices can achieve the desired effect. Many researchers have proposed a probabilistic routing method to calculate the probability of a nodes' final successful delivery to the destination by training the probabilistic equation. However, these methods could not consider whether the selected relay node is willing to cooperate in forwarding messages. Furthermore, nodes move randomly and socially, and it is worth considering whether such social relations will affect the activity rules of nodes. Therefore, we designed a model based on probabilistic prediction by quantifying social relationships and cooperative relationship values. We need to analyze and process and make decisions on a large amount of data, and improve the user satisfaction through active caching of edge devices to meet the needs of users in the 5G network [20]. In this case, we can make transmission calculations and decisions through small base stations or edge devices to reduce the workload of mobile nodes. Consequently, we have collected the characteristic information of nodes in the region through the base station and quantified the collected information to form the encounter matrix, cooperative matrix, and social relation matrix, respectively. Then, we used the improved probability matrix decomposition method to integrate the matrix information to update the encounter probability matrix and predict the probability value of the encounter and cooperation between two nodes without the encounter history information and updated the existing value. Finally, the neighbor node is filtered by the size of EC in the updated probability matrix. Experiments show that the proposed model reduces the network overhead of non-cooperative nodes, optimizes the path of messages to the destination nodes, and enables messages to be transmitted along the cooperative nodes with a high probability of encountering the destination nodes.

Node Data Collection and Transmission
The base station (BS) collects information of all nodes in the area and trains the encounter matrix at period, T, in the area. As shown in Figure 1, the base station collects the information of all nodes in the region over a period of time and trains the final probability matrix according to the designed model through these features. If the social relationship and the cooperative relationship cannot be obtained in the initial stage of the model application, the final probability matrix is the historical encounter probability matrix. When a node has a transmission task, the probability table of known nodes and destination nodes is requested from the base station. Specifically, when a new node enters the area, its characteristic information is sent to the BS. If it carries a probability forecast table, the BS obtains the table and updates its own matrix based on the probability forecast table obtained from the BS. If it does not carry a probability forecast table, it is sent to the matrix trained by the BS in the area. After the T period, the BS trains the matrix, M, according to Algorithm 1 (refer to Section 3.5), and transmits the matrix information according to the node information collected in the past T periods. During the period, the matrix, M, is updated by Algorithm 2 (refer to Section 3.5) according to the obtained node's probability forecast table and communication between BS before the T period. The main update idea is to add a record without some node information. The exchange of information when the mobile devices meet is also the same as updating the records carried. The message carried by the mobile devices is sent to the encounter devices whose probability of encounter is greater than the mobile device's own and destination mobile device's probability of encounter. If there is no record of the destination node, it is sent to nodes in the encounter node that have a higher probability of encountering mobile devices in another area.

Node Data Collection and Transmission
The base station (BS) collects information of all nodes in the area and trains the encounter matrix at period, T, in the area. As shown in Figure 1, the base station collects the information of all nodes in the region over a period of time and trains the final probability matrix according to the designed model through these features. If the social relationship and the cooperative relationship cannot be obtained in the initial stage of the model application, the final probability matrix is the historical encounter probability matrix. When a node has a transmission task, the probability table of known nodes and destination nodes is requested from the base station. Specifically, when a new node enters the area, its characteristic information is sent to the BS. If it carries a probability forecast table, the BS obtains the table and updates its own matrix based on the probability forecast table obtained from the BS. If it does not carry a probability forecast table, it is sent to the matrix trained by the BS in the area. After the T period, the BS trains the matrix, M, according to Algorithm 1 (refer to Section 3.5), and transmits the matrix information according to the node information collected in the past T periods. During the period, the matrix, M, is updated by Algorithm 2 (refer to Section 3.5) according to the obtained node's probability forecast table and communication between BS before the T period. The main update idea is to add a record without some node information. The exchange of information when the mobile devices meet is also the same as updating the records carried. The message carried by the mobile devices is sent to the encounter devices whose probability of encounter is greater than the mobile device's own and destination mobile device's probability of encounter. If there is no record of the destination node, it is sent to nodes in the encounter node that have a higher probability of encountering mobile devices in another area.

Encounter Relationship Value Calculation
First, we give a definition of computing the encounter probability. To facilitate the description of related problems, in this paper, ij m is the probability of an encounter between node i and node j over a period of time. That is, the probability that node i and node j meet within the perceived range. The definition is as follows:

Encounter Relationship Value Calculation
First, we give a definition of computing the encounter probability. To facilitate the description of related problems, in this paper, m ij is the probability of an encounter between node i and node j over a period of time. That is, the probability that node i and node j meet within the perceived range. The definition is as follows: where i denotes the source node, and j denotes the target node. w ij is the number of historical encounters between node i and node j within a certain period of time. w i,adj indicates that the number of times that node i has met with other nodes is within a certain period of time.

Social Relationship Value Calculation
The number of encounters between nodes simply reflects the node's encounters over a period of time. We cannot just rely on the number of node history encounters to predict an encounter in the future. Here, we consider the quantitative factors of mobile node social relations.
(1). Node Similarity We believe that the closer the social relationship is, the more likely it is to meet in the future. We first consider the number of shared neighbor nodes in the node's social relationship. The similarity between nodes is defined as the number of common neighbors between two nodes. This equation can be defined as: where sim ij represents the similarity between node i and node j. |· · · | represents the number of nodes in the collection. C i is the set of neighbor nodes connected to node i at the current time, and C j is the set of neighbor nodes connected to node j at the current time. Considering that two nodes share a similar set of neighbor nodes, two nodes are more likely to pass data through a common neighbor. Therefore, we consider that the relationship between the two nodes is higher and the probability of data transmission is higher. (

2). Devices' Mobility
Considering other factors that affect the value of total social relationships, node mobility change and node connection transformation are important factors that must be considered in the value of the overall social relationship. In a mobile opportunistic network, since the nodes are constantly moving, the connection between the nodes and the neighbors will change over time. In a continuous T time interval, the degree of change in the movement of such a node relative to another node is defined as the degree of motion transformation: where Move ij is the moving degree of the node at the current time. C i represents the set of neighbor nodes of node i before the T time interval, as is the case with C j . C i represents the set of neighbor nodes of node i at the current time, and the same is true for C j . From Equation (3), we can deduce that the higher the frequency of the movement of node j relative to node i, the larger the transformation of its neighbor node set relative to i, and the greater the degree of motion transformation.

(3). Connection Transformation
The degree to which a shared neighbor node of a node's connection with respect to another node changes and the dynamics of the neighbor node connection between them are referred to as the connection degrees of transition. In successive T time intervals, the degree of change in the connection of such a node relative to another node is defined as the connection transformation: From the formula (4), Conn ij is the degree of change in the shared neighbor node set connected to the node. It can be seen that the greater the change in the set of shared neighbor nodes of the connection is, the higher the value of Conn ij . By observing and analyzing the common neighbor nodes of the two nodes, we can see the motion changes of the two nodes in the current network. When the common neighbor node with node i associated with node j increases, the connection probability between node i and node j will increase, and then the possibility of data exchange will be improved.
Through the analysis of node similarity and node relative change, we can quantify the social relationship values between two nodes as follows: Among them, the smaller the degree of transformation, the greater the value of social relations.
α, β, γ are the coefficients of node similarity, moving transform, and connection transform, respectively. They represent the weighting factors that influence the value of social relationships by different factors, and α + β + γ = 1.

Decomposition Method
We represent the probability of encounter and social relationship values between the nodes described above in a matrix. Wherein, let M = {m ij } denote the matrix of n × n; that is, the encounter probability matrix. For a pair of nodes, n i and n j , m ij ∈ [0, 1] denote the historical encounter probability of node i to node j. Let S = {s ij } denote the social relationship matrix of n × n. Two nodes with strong social relationships affect each other's probability of encountering the same node. We also believe that nodes are more willing to be close to the nodes with high social relations. Because the devices in the opportunistic network are mostly carried by people, it is significant to analyze the probability of node encounters through the social relations of nodes. It is assumed that node, n a , knows nothing about a node, n c , and it meets node, n b , and node, n c , with a high relation value of node, n a . So, node, n a , meets it because it is close to node, n b , and then node, n a , and node, n c , also meet. From the above analysis, we can summarize the above social process as: where M ij is the prediction of the probability that user u i meets user u j , M ij is the probability that user u i meets user u j , κ(i) is the neighbor's set that user u i relations and |κ(i)| is the number of related users of user u i in the set, κ(i). |κ(i)| can be merged into S ij , since it is the normalization term of the relation evaluation of the estimate. Then, we can infer the prediction of the probability that user u i for all users is as follows: Consequently, for all users, we can infer as: where SM can be interpreted as the probability forecasting purely based on the user relevance. The idea of the probability matrix decomposition is to derive the l-dimensional features of the high-quality representative node (or user), N, of the node (or user) based on the analysis of the probability values of the encounter between the nodes. Let N ∈ M l×n and X ∈ M l×n be the latent nodes and social factors. The feature matrix, the column vectors, N i and X j , represent the potential feature vectors of the node and social factor, respectively. We define the conditional distribution of the observed encounter probability as: where N (x|u, σ 2 ) is the probability density function of the Gaussian distribution with a mean of u and a variance of σ 2 , and I M ij is an exponential function. If node i meets node j, it is equal to 1, otherwise it is equal to 0. The function, g(x) = 1/(1 + exp(−x)), is a logarithmic function, which allows the function value corresponding to N T i X j to fall in the interval [0, 1]. We use zero-mean spherical Gaussian priors [24,25] on the node and social factor feature vectors: So, through the node and social factors features with a simple Bayesian inference [26] in Formulas (9) and (10), we have: We can assume that S is independent with the low-dimensional matrices, N and X, then this Equation (11) can be changed to:

Cooperative Probability Calculation (1). Node Preference
Pre f ij is the degree of preference between the two nodes, i and j. The node preference is different from the node encounter frequency, which refers to the time occupation ratio between two nodes establishing a connection and transmitting a message within a period of time. The specific calculation formula is as follows: t k i,j represents the total time taken for message delivery when node i and node j establish a connection for the kth time. t k i,other represents the total time taken for message delivery when node i and other nodes establish a connection for the kth time.

2). Node Relationship Stability
Rel ij is the correlation between the average interval time delivery messages of the two nodes, i and j. Node relationship stability refers to the average interval between messages sent between nodes. Through it we can predict the possibility of future nodes establishing connections for a long period of time. This means that the shorter the connection interval in the most recent period, the more likely it is that the connection will be established in the future. In a period of time, the more the number of encounters, the longer the interval. Which indicates that the node has stable cooperation for a long time, as described below: In the above equation, k is the number of message passes in T time. T k i,j and T 1 i,j are the time of the kth transmission of the message and time of the first transmission of the message, respectively, in T time.

(3). Node Activeness
We use the number of times the node forwards the message for a period of time as a measure of node activeness. The higher the node activity, the higher the degree of cooperation is to some extent. We define it as follows: where re i indicates that the number of times the node i forwards the message is within a certain period of time, T.

(4). Data Demand
We consider that nodes are more willing to cooperate to deliver data with common needs. At the time of message delivery, the node gives a rating feedback of the message demand. Whether a node meets and connects with other nodes within a certain period of time, the node prefers to receive and forward data useful to itself. Therefore, if the score of node feedback is higher for a period of time, it indicates that the probability that the neighbor node is willing to cooperate to forward the message is larger. Increasing this parameter can help predict the likelihood of future cooperation between nodes.
where q τ ij represents the data demand feedback score given by node j when node i passes a message to node j. Q represents the highest score.
Through the degree of node preference, relationship stability, activity, and data demand analysis, we can quantify the cooperation probability values between two nodes as follows: Among them, a, b, c, d is the coefficient of node preference, relationship stability, activity, and data demand. They respectively represent the weighting factors that influence the value of cooperation and a + b + c + d = 1.

Residual Energy Ratio
We consider the residual energy of the current moment of the mobile node. When the remaining energy of the mobile node is greater, the node is more willing to cooperate to forward the data.
where E r represents the energy remaining in the device and E d is the energy required for the packet to be transmitted.

Cooperation and Energy Decomposition
We consider the cooperation probability matrix. Suppose we have n nodes. The probability of forwarding messages from n nodes to different nodes is different. In other words, the selfishness of nodes for different nodes is different. We use the above method to calculate the cooperation value to form a cooperative forwarding probability matrix. Considering that the cooperation relationship between nodes is related to the ratio of the remaining energy of nodes, we believe that the greater the residual energy ratio of nodes, the more stable the cooperative relationship of nodes. It is significant to analyze the probability of node cooperation through the residual energy ratio of nodes. From the above analysis, we can define this as: where E is a vector with a residual energy ratio of n nodes, and the • symbol is defined as each element corresponding to the i-th column element multiplied by the ith column of R. The new matrix we get is still represented by R, which is R = R. Therefore, we let r ik represent the probability of the message node, n k , forwarding sent by node, n i . We set N ∈ R l×n and Y ∈ R l×n as the feature matrix of potential nodes and cooperation factors, and the column vectors, N i and Y k , respectively represent potential feature vectors of the node and cooperation factor. We define the conditional distribution of the observed cooperative probabilities as: where I R ij is an exponential function. If node n i has the probability of forwarding the message sent by node n k , it is equal to 1, otherwise it is equal to 0. We also place zero-mean spherical Gaussian priors [24,25] on the node and cooperation factor feature vectors: Therefore, through the node and cooperation factors features with a simple Bayesian inference [26] in Formulas (20) and (21), we have:

Hybrid Model Solution
Through the above processes, to reflect the probability that the historical encounter between nodes will affect the judgment that the node is more likely to meet a certain node, we use the graphical model shown in Figure 2 to model the problem of probability prediction. The model combines the encounter probability matrix, social relationship matrix, and node cooperative probability matrix into a consistent and compact feature representation. According to Figure 2, and Equations (11) and (22), the logarithm of the posterior distribution of node encounters and cooperation probability matrix are given by: where ε is a constant that does not depend on parameters. The hyperparameters (i.e., observed noise variance and a priori variance) are kept fixed after maximizing the logarithmic values of the three potential features, which is equivalent to the use of quadratic regularization terms to minimize the following sum of squared error objective functions: About formula (24), The local minimum of the objective function given by Equation (24) can be found by performing gradient descent in N i , X j , and Y k .
where ϕ(i) is the set of trust nodes for node n, and g (x) is the derivative of the logarithmic function,

Algorithm
The new probability matrix is obtained by using the matrix of social relation transformation and the matrix of cooperation forwarding relation. Because meeting the probability matrix is obtained by node meet history information, which is often incomplete, we need to fill in the missing value with additional information or update the encounter value with the cooperation degree and social of the devices. We called the newly updated probability matrix the encounter cooperation matrix containing the encounter cooperation (EC) values. Our algorithm is shown as follows:

Algorithm
The new probability matrix is obtained by using the matrix of social relation transformation and the matrix of cooperation forwarding relation. Because meeting the probability matrix is obtained by node meet history information, which is often incomplete, we need to fill in the missing value with additional information or update the encounter value with the cooperation degree and social of the devices. We called the newly updated probability matrix the encounter cooperation matrix containing the encounter cooperation (EC) values. Our algorithm is shown as follows:

11:
If the EC value of the node n z in N adj and destination node> the EC value of node n i and destination node //The neighbor with the highest EC Value of the target node is selected as the next hop node. 12: Choose next hop nodes n z 13: A. add (n z ); //The selected devices are used as a new message-carrying devices. 14: End if 15: End for 16: End

Complexity Analysis
The main calculation method of the gradient method is to calculate the objective function, L, and its gradient according to the variables. The computational complexity of the objective function, L, is O(ρ M l + ρ M vl + ρ R l), where ρ M is the number of non-zero terms in the matrix, M, and v is the average number of nodes that have a social relationship with the node. Since there are not many users in a region who have a social relationship value with a user, this indicates that the value of v is relatively small. The computational complexities for the gradients, ∂L (25) are O(ρ M sl + ρ M svl + ρ R l), O(ρ M l + ρ M vl) and O(ρ R l), respectively. Among them, s is the average number of devices with social relationships that are close to one device, ρ R is the number of non-zero terms in the matrix, R. Actually, in a complex social network, the value of s is approximately equal to the value of v. Therefore, the total computational complexity of an iteration is O(ρ M sl + ρ M svl + ρ R l), which indicates the calculation time of our method is linear with the amount of information collected by the node. If the obtained matrices are all sparse matrices, the algorithm will execute rapidly with a time complexity of O(n). It also applies when the number of nodes is large. However, if the obtained matrix is a dense matrix, the complexity in the communication range of big data with dense nodes and complex relations is difficult to estimate. Since the complexity of the CAOP, SSIS, and SIaOR algorithms are both O(n 2 ), the complexity analysis shows that the proposed method is simpler than them.

Experiment Analysis
In this section, we evaluate the performance of the proposed model, PEBN, in OMNets. The OMNets is a free, open source multi-protocol network simulation software, which plays an important role in network simulation.

Parameter Settings
The article uses the open source simulation tool, OMNets, to make the simulation. The network has a size of 3400 m * 4500 m and 100-1300 nodes. Each node is moving randomly. Their moving speeds are, respectively, 1 m/s and 10 m/s. After reaching its destination, the node stays there for a random time. The sensing radius of the node is 50 m. The message generation interval is 25 to 35 s, the transmission interval is 17-18 min. The Time to Live is set to 100 min. The matrix update interval is 180 min. In the above simulation environment, we compared the performance of CAOP, SSIS, and SIaOR algorithms and our proposed routing algorithm (PEBN) in the delivery rate, network overhead rate, and hop count with the change of the number of nodes and simulation time.

Parameter Analysis
The performance of PEBN is affected by some time-dependent parameters. First, we analyzed the parameters before comparing the algorithms. In order to improve the performance of the algorithm, we set the reasonable parameter values by experimentally analyzing the update period of the matrix and the node message delivery time interval.
Different time periods, T, have different effects on the results of model training. Too short a time period will increase the network resource consumption. If the time period is too long, it will not be able to timely capture the latest dynamic changes of the node, reducing the accuracy of prediction, and affecting the transmission success rate. Therefore, we need to select the appropriate time period to update the predicted target matrix. Through analysis, as shown in Figure 3a, we can observe the fluctuation of the curve. As the time period increases, the transmission success rate increases first and then decreases. The transmission success rate reaches a large value at 175 min. Due to the change of the time period, the average transmission hop count is also affected. If the time period is set too long, it will affect the value of the model training and will be not accurate enough, resulting in more hops. Therefore, it can be seen from Figure 3b that as the time period increases, the hop count decreases and then increases. The phenomenon is because the time period is too short to collect enough information to reduce the error. When the time period reaches 200 min, the hop count is small. We can use the tradeoff method to determine that the period, T, is 180 min and increase the number of hops to ensure the delivery ratio. can use the tradeoff method to determine that the period, T, is 180 min and increase the number of hops to ensure the delivery ratio.
We set the node to pass a message at t  intervals. When a node transmits a message, a copy of the message is sent from a node within the messaging range that filters EC values greater than those of itself and the destination node. If the interval, t  , is too short, it will cause the nodes in the network to send the same message copies to the same neighbor node more than once. The neighbor node receives the same message copy and refuses to receive the message, resulting in unnecessary network overhead. If the time interval, t  , is too long, it will cause the message in the network to spread slowly and increase the transmission delay of the message. From Figure 4b, we can observe that as t  increases, the number of nodes covered by message copies transmitted by the node also decreases. During the 16 min to 18 min time period, the number of nodes covered by the message copy is large. However, the transmission of messages after t  interval also affects the transmission success rate of the nodes. Just because a larger number of nodes are covered by the message does not mean that the delivery ratio can be improved; it may cause route congestion and cause information loss. Therefore, we can observe from Figure 4a that when t  reaches 20 min, our delivery rate reaches the highest value. Since the value of t  is between 16 min and 18 min, the node coverage does not fluctuate, and the transmission success rate fluctuates less in the range of 18-20 min.
Therefore, we set t  to 18 min.
To sum up, in the model with consideration of the node's transmission success rate and transmission delay variation, we set T = 180 min and t  = 18 min for the following experiments.   We set the node to pass a message at ∆t intervals. When a node transmits a message, a copy of the message is sent from a node within the messaging range that filters EC values greater than those of itself and the destination node. If the interval, ∆t, is too short, it will cause the nodes in the network to send the same message copies to the same neighbor node more than once. The neighbor node receives the same message copy and refuses to receive the message, resulting in unnecessary network overhead. If the time interval, ∆t, is too long, it will cause the message in the network to spread slowly and increase the transmission delay of the message. From Figure 4b, we can observe that as ∆t increases, the number of nodes covered by message copies transmitted by the node also decreases. During the 16 min to 18 min time period, the number of nodes covered by the message copy is large. However, the transmission of messages after ∆t interval also affects the transmission success rate of the nodes. Just because a larger number of nodes are covered by the message does not mean that the delivery ratio can be improved; it may cause route congestion and cause information loss. Therefore, we can observe from Figure 4a that when ∆t reaches 20 min, our delivery rate reaches the highest value. Since the value of ∆t is between 16 min and 18 min, the node coverage does not fluctuate, and the transmission success rate fluctuates less in the range of 18-20 min. Therefore, we set ∆t to 18 min.
transmission delay variation, we set T = 180 min and t  = 18 min for the following experiments.   To sum up, in the model with consideration of the node's transmission success rate and transmission delay variation, we set T = 180 min and ∆t = 18 min for the following experiments.

Methodology
In a real environment, each node is selfish. To make the simulation experiment closer to the real scene, we randomly set the selfishness of the node relative to other nodes in the experiment. When a node forwards a message, it forwards the message according to a certain cooperation probability that removes the selfishness. It should be emphasized that we aim to select nodes that have a high probability of encountering the target node and are willing to cooperate to forward the message. In the experiment, the cooperation matrix of the algorithm was not set manually, but was based on the cooperative probability calculated by the number of times the node helps a node to forward the message in the previous T period.
The evaluation of the model mainly comes from two aspects: (1) Effectiveness analysis. The experiment compares the difference between our presented model and other existing models in optimizing the network structure and improving the information delivery ratio. (2) Adaptability analysis: The probability matrix is periodically updated by various dynamic change information, and the nodes that can cooperate and have a high probability of encountering with the destination node are filtered to transmit data, thereby increasing the delivery ratio and reducing the average hop count. As a reference, we set a list of performance metrics, which is used for comparison, as follows:

1.
Delivery ratio is the ratio of the number of delivered messages to the total number of generated messages.

2.
Overhead is the average number of intermediate nodes used for one delivered message.

3.
Hop count is the average hops of successfully delivered messages.
We use these performance metrics to analyze the effects of the model and compare them with other researchers' proposed solutions. To compare the performance of the PEBN protocol with other schemes, the delivery ratio, overhead ratio, and hop count were measured. Figures 5 and 6 shows the delivery performance of the PEBN protocol when the number of nodes and simulation time are different, respectively. From Figure 5 we can see that the delivery ratio of messages by the PEBN protocol is higher than other schemes due to better choices while sending the messages. This is because our solution sets a different degree of selfishness to each node for other nodes. Other schemes can easily lead to message loss, because they cannot fully consider the cooperating factors and social relationship factors of the nodes, and even cannot get more information about the nodes. Therefore, our algorithm can obtain more accurate predictions to have a better choice in the process of screening nodes. Moreover, as the number of nodes increases, the training of the prediction matrix is more extensive and accurate due to the increase of node social information. Thus, obtaining more unknown information between nodes, increasing the number of candidate nodes in the information transmission process, and improving the accuracy of node selection, thus makes it more likely to select the optimal node. From Figure 6, it can be observed that the delivery performance of the PEBN protocol is better than the other scheme. Because we not only consider the probability of encountering the target node, but also the path of the target node, it is unlikely to lose the packet. In addition, as the simulation time increases, the message delivery rate also increases, and the message will have more time to wait for the next best node.    Figure 7, the average hop count of the protocols is compared with others schemes. The PEBN protocol has the smallest average hop count in the compared scheme because it optimizes the routing of message transmissions more efficiently than other schemes. It can be seen in Figure 7 that as the number of nodes increases, the average hop count of the node increases and then gradually stabilizes. Our solution considers many factors, and as the number of nodes increases, the relationship between nodes becomes more complex, and more hidden features between nodes are mined through probabilistic prediction models, thus reducing the number of hops from the source node to the destination node. However, other methods may not have many parameters to enhance the adaptability of the model. Therefore, our model performs better. From Figure 8, it can be observed that our proposed scheme has fewer hops than other schemes. This is because our solution collects and processes node feature information, so we can get as much data as possible to more accurately predict the probability of encounters between nodes. As the simulation time increases, the node can capture more information about the movement of the nodes, which can add more additional information to the process of selecting the key nodes, making the model more robust. Therefore, the average hop count of the node is gradually reduced and stabilized.   Figure 9, it can be observed that the routing overhead of our scheme is compared with other schemes. Compared with the result, PEBN has a better performance than other models. Since we apply the node cooperation to our proposed model, we can also perform well in scenarios where the node has selfish features. PNEC can predict transmitting neighbors better than other models; the message will be forwarded to a node with a high probability of satisfying the destination in the transmission route, which effectively reduces the cost    Figure 9, it can be observed that the routing overhead of our scheme is compared with other schemes. Compared with the result, PEBN has a better performance than other models. Since we apply the node cooperation to our proposed model, we can also perform well in scenarios where the node has selfish features. PNEC can predict transmitting neighbors better than other models; the message will be forwarded to a node with a high probability of satisfying the destination in the transmission route, which effectively reduces the cost loss of sending the message to the non-cooperative node. In addition, we can also observe that as the   Figure 9, it can be observed that the routing overhead of our scheme is compared with other schemes. Compared with the result, PEBN has a better performance than other models. Since we apply the node cooperation to our proposed model, we can also perform well in scenarios where the node has selfish features. PNEC can predict transmitting neighbors better than other models; the message will be forwarded to a node with a high probability of satisfying the destination in the transmission route, which effectively reduces the cost loss of sending the message to the non-cooperative node. In addition, we can also observe that as the number of nodes increases, the amount of routing overhead also increases. An increase in the number of nodes means the number of candidate nodes for message selection increases. Messages need to be passed to more nodes, thereby increasing the overhead of information being propagated over the network. Moreover, this same observation can be seen in Figure 10, which shows that routing overhead increases as the simulation time is increased. Because the nodes do not need to continuously calculate and decide through the proposed model during the information transmission process, the transmission task of nodes becomes simpler. Furthermore, we have more complex probability models to choose neighbors, so our overhead increase ratio is relatively smaller than other models with the increase of simulation time. routing overhead, and hop count. Specifically, the algorithm not only improves the data transmission efficiency in the network, but also adopts the updated information to adapt to the current network environment when the network topology changes dynamically. In addition, handling complex transmission decisions through base stations or edge devices reduces the node overhead. By combining the social and cooperative relationships between nodes with a probabilistic prediction model, a better transmission path can be selected to reduce the number of hops to the destination node. Therefore, our scheme can improve performance in the transmission environment.

Conclusions
We proposed a routing decision method based on an improved probability model combined with a quantitative social relationship value and cooperative value to filter neighbor nodes. The algorithm combines multiple feature information between nodes and uses this feature information to quantify social relationship values and partnership values. Then, the prediction matrix was obtained by matrix decomposition and gradient descent, and the relay nodes were filtered according to the predicted probability values. In our model, we first quantified the node social relationship value and the cooperative relationship value based on the collected information to form the social relationship matrix and the cooperation relationship matrix. Then, we used them to update and predict the probability of encountering cooperation between nodes in the way we proposed. Finally, in the transmission phase, the node requested a probability table associated with the destination node, and routing overhead, and hop count. Specifically, the algorithm not only improves the data transmission efficiency in the network, but also adopts the updated information to adapt to the current network environment when the network topology changes dynamically. In addition, handling complex transmission decisions through base stations or edge devices reduces the node overhead. By combining the social and cooperative relationships between nodes with a probabilistic prediction model, a better transmission path can be selected to reduce the number of hops to the destination node. Therefore, our scheme can improve performance in the transmission environment.

Conclusions
We proposed a routing decision method based on an improved probability model combined with a quantitative social relationship value and cooperative value to filter neighbor nodes. The algorithm combines multiple feature information between nodes and uses this feature information to quantify social relationship values and partnership values. Then, the prediction matrix was obtained by matrix decomposition and gradient descent, and the relay nodes were filtered according to the predicted probability values. In our model, we first quantified the node social relationship value and the cooperative relationship value based on the collected information to form the social relationship matrix and the cooperation relationship matrix. Then, we used them to update and predict the probability of encountering cooperation between nodes in the way we proposed. Finally, in the transmission phase, the node requested a probability table associated with the destination node, and selected a node with a high probability of encountering the destination node as the next hop node.
Overhead ratio(%) Figure 10. The relationship between time and overhead.
The conclusion can be drawn by comparing the three indicators of several schemes with the number of nodes and the simulation time. The PEBN algorithm adds social relationships and cooperation relationships when predicting probabilities so that a node can deliver a message to a cooperating node that has a high probability of encountering a destination node. Experiments show that the PEBN algorithm outperforms other algorithms in terms of the transmission success rate, routing overhead, and hop count. Specifically, the algorithm not only improves the data transmission efficiency in the network, but also adopts the updated information to adapt to the current network environment when the network topology changes dynamically. In addition, handling complex transmission decisions through base stations or edge devices reduces the node overhead. By combining the social and cooperative relationships between nodes with a probabilistic prediction model, a better transmission path can be selected to reduce the number of hops to the destination node. Therefore, our scheme can improve performance in the transmission environment.

Conclusions
We proposed a routing decision method based on an improved probability model combined with a quantitative social relationship value and cooperative value to filter neighbor nodes. The algorithm combines multiple feature information between nodes and uses this feature information to quantify social relationship values and partnership values. Then, the prediction matrix was obtained by matrix decomposition and gradient descent, and the relay nodes were filtered according to the predicted probability values. In our model, we first quantified the node social relationship value and the cooperative relationship value based on the collected information to form the social relationship matrix and the cooperation relationship matrix. Then, we used them to update and predict the probability of encountering cooperation between nodes in the way we proposed. Finally, in the transmission phase, the node requested a probability table associated with the destination node, and selected a node with a high probability of encountering the destination node as the next hop node. The simulation results show that the protocol performs better than the SISW, CAOF, and SIaOR transmission models in the transmission success rate, average hop count, and overhead. For a single node, the model optimizes the path from the source node to the target node. For the entire network, the performance of the network is improved to accommodate large-scale data transmission in the 5G. In the future, we will use real data sets to simulate real-world scenarios and explore other more efficient ways to improve information transmission.