A Probabilistically Weakly Secure Network Coding Scheme in Multipath Routing for WSNs

In wireless sensor networks, nodes are mostly deployed in unsupervised areas and are vulnerable to a variety of attacks. Therefore, data security is a vital aspect to be considered. However, due to the limited computation capability and memory of sensor nodes, it is difficult to perform the complex encryption algorithm, as well as the key distribution and management algorithm. Toward this end, a low-complexity algorithm for security in wireless sensor networks is of significant importance. In this article, a weakly secure network coding based multipath routing scheme is proposed, which can guarantee the data confidentiality in transmission probabilistically, and can improve the energy efficiency in the meantime. Then the simulations of the probability of transmission being secure are performed. The results show that with the increase of the number of hops k, the probability of transmission being secure suffers from a rapid decrease. On the contrary, with the increase of multicast capacity h it undergoes a slight growth. Therefore, the weak security can be achieved with probability approaching 1 by limiting the number of hops and increasing the multicast capacity. Meanwhile, the simulations of energy consumption are performed and the comparison between the energy consumption of the scheme in this article and the multipath routing scheme without network coding is conducted. The results show that by employing network coding, the scheme in this article can improve the energy efficiency, and the more packets transmitted, the more energy consumption can be reduced.


Introduction
Due to the complex working environment, wireless sensor networks (WSNs) can suffer from a variety of attacks. Therefore, transmission security, including data confidentiality, data integrity, and data availability, is a vital aspect to be considered [1][2][3][4]. Existing researches on security of WSNs are mostly based on encryption/decryption. In [5], a symmetric encryption algorithm is proposed, which is an amalgamation of two different encryption algorithms in randomized method. In [6], the authors analyzed the security challenges in WSNs and smart home systems, then proposed a security evaluation technique based on attack graph generation. SNEP protocol is one of most maturely applied security protocols in WSNs [7]. Since the communication among the nodes requires the involvement of the base station, the SNEP protocol is of rather low efficiency and is not applicable in large-scale networks. Since the low-cost wireless sensors which are battery-powered have limited computational capability and memory, it is difficult to perform complicated encryption algorithms in WSNs. Moreover, achieving key distribution and key management is also a great challenge in large-scale WSNs. In [8], a random key distribution based key management protocol is proposed. In this protocol, a key pool with size of S is established firstly and each node in the network stores m

Adversary Model
In this article, the communication network can be described as a directed acyclic graph G = (V, E), where V is the set of nodes and E is the set of links. For each link e ∈ E, we define tail(e) and head(e) as the tail and head of e, respectively. In the set of nodes V, it is denoted by s the source node, and by T the sink node, which is the base station in the practical WSNs. For each node v ∈ V, let Out(v) and In(v) denote the set of outgoing channels and incoming channels of v, respectively. That is, Out(v) = {(v, u) : (v, u) ∈ E} and In(v) = {(u, v) : (u, v) ∈ E}. For the eavesdropper, let V eav and E eav denote the set of nodes and channels being eavesdropped, respectively. The multi-cast capacity H is the minimum number of edges in any cut between the source node and sink node. Each channel e ∈ E contains a message packet, whose elements are selected from a finite field F q , where q is the size of the finite field. For the source node, we introduce H artificial channels which carry the H source packets that the source node transmits to the base station.
It is supposed that there exist some nodes that are randomly deployed in a specified district, and the number of nodes is denoted by N. After the routing procedure is completed, there exist N c intermediate nodes that are involved in transmission. It is assumed that there is an adversary which is randomly located in the district, and for each intermediate node, there is a probability that it may be attacked by the adversary and the figure of the probability is up to the range that the adversary can control. Once the intermediate node is attacked, it can be controlled by the adversary and the messages it receives and transmits can be observed completely by the adversary. For each transmission, whether an individual node is controlled is independent. Notably, the source node and the base station cannot be attacked, otherwise the malicious node can get the message without any loss.

Calculation of Number of Paths
Suppose that the successful delivery ratio (SDR) is denoted by r. For every single link between any two nodes, the link failure probability is e. In addition, the average number of hops of the paths from the source node to the base station is k, the desired multicast capacity is h. To simplify the question, it is assumed that the number of hops of each path exactly equals to k. Transmission of packets on each hop can be regarded as dependent event, hence, for each path, the probability of successfully delivering one packet is For one transmission, only if the number of successful paths is at least h do we call it a successful transmission. Since the desired multicast capacity is h, it requires H ≥ h paths to guarantee the expected successful delivery ratio R. Under the condition that link failure probability is e and the number of hops is k, let H h,e,k,R denote the least number of paths to implement to achieve capacity of h and SDR of R. Among the H paths, the number of successful paths should be at least h to guarantee the correct recovery of original message at the base station. Hence, the SDR can be represented as In this article, it is essential to determine the least number of H, referring to as H h,e,k,R , to guarantee that the successful delivery ratio satisfies r ≥ R , i.e.
In practice, it can be impossible to find the analytical solution of formula (3). So we need to find the numeric solution by the iteration algorithm. However, when h gets larger, it can be of great computational complexity to perform the iteration algorithm to get H. Consequently, it is necessary to adopt another algorithm with light complexity to get the approximated solution of (3), which is also presented in [15,16]. Let H s denote the number of the successful paths in the H paths, then H s satisfies the Binomial distribution B(H,p k ), and the mean value and the variance of H s can be written as The successful delivery ratio r can be rewritten as Thus, the question can be described as finding the least H that guarantees P(H s ≥ h) ≥ R. According to the central-limit theorem, the Binomial distribution can be regarded as Normal distribution approximately, i.e., H s ∼ N(µ, σ 2 ) Let then H * s satisfies the standard normal distribution, i.e.
In the standard normal distribution, for the given R, the value of x R that satisfies P(H * s ≥ x R ) ≥ R can be obtained from the probability density function of standard normal distribution, which means After that, the value of H can be calculated through the Equation (11).
Hereto, given the desired successful delivery ration R, the link failure probability e, the average number of hops k, and the expected multi-cast capacity h, the least number of paths can be calculated according to Algorithm 1:

Least Number of Communication Nodes
After getting the least number of paths H according to the algorithms mentioned above, the routing procedure can be proceeded to establish those H paths to build the communication network. For the sake of energy efficiency and transmission security, it is essential to involve as few nodes as possible in communication under the network conditions. Given the number of hops of each path, and the number of paths, we define N l as the least number of communication nodes, a.k.a the least number of nodes that need to be involved in communication to satisfy the conditions.

Algorithm 2
The algorithm to calculate the least number of communication nodes 1: Initiate with the parameters H, k 2: Create the first path and set num_node ← k − 1, s_path ← 1 3: while s_path < H do 4: \\ Create a new path 5: p_ f lag ← 0 6: for i = 1 : num_node do 10: new_node ← 1 11: if the i − th node is the current hop||exists parallel channels||exists a loop then 12: Continue 13: else 14: pick the i − th node as the next hop 15 In Algorithm 2, the first path with k hops is established initially. To do so, there should be k − 1 intermediate nodes involved. Then, it can be proceeded to establish the rest H − 1 paths. What is notable is that the network topology is node-braided and edge-disjoint, which means that between any two paths, there may exist common nodes but cannot exist any common edges. Therefore, when executing the routing, it should be firstly determined whether the existing intermediate nodes can be used as the next hop of the current path, if not, then it is necessary to introduce a new node. Meanwhile, it is crucial that there cannot exist parallel links between any two nodes, either loops in the whole network.
According to Algorithm 2, some simulations are performed and the results are shown in Figure 4.

Weakly Secure Network Coding
In this section, a packet format which can get rid of centralized knowledge of network topology is proposed, which is shown in Figure 5. Here the 'Source ID' is the ID of the transmitting node, the 'Dest ID' is the ID of the receiving node, and the 'Generation ID' is the identifier of a generation. The source node categories all the source packets into some groups and each group includes h source packets. Such a group is called one generation. In each generation, the h packets are assigned with a packet ID ranging from 1 to h respectively. In addition, the source sends one generation in each transmission. The 'Coding Vector' is the vector of combination coefficients of the coded packet.
To achieve weak security, the source node needs to encode the data before transmitting it and this process is called pre-coding. The pre-coding algorithm in this article is generated from the algorithm in [17]. Compared with the algorithm in [17], the algorithm in this article introduces more non-linear property to the coded packets.
According to the hypothesis, the source node can transmit h packets in one transmission. Therefore, without loss of generality, the source message can be denoted as Then the pre-coded message can be written as where The function f is a permutation function and both its input and output are vectors which consist of elements in the finite field. Note that the construction of function f is public to all nodes, even including the adversary.
After the pre-coding procedure is completed, the source node applies the generating matrix G to the coded message to generate H packets to transmit along the H outgoing channels of the source node s, where G is a H-by-h matrix whose elements are chosen randomly from the finite field. For the sink node, the received packets can be denoted as Y = [y 1 , y 2 , · · · , y m ] T = CM , where C is the coding matrix of Y, and the i − th row of C is the coding vector of packet y i . Then by using Gaussian elimination method, M can be calculated. After that, m 1 , m 2 , · · · , m h can be calculated iteratively according to formula (16): In this way, the sink node can decode all the h packets in one generation.

Security Analysis
Since each node is deployed randomly in the district and the adversary is randomly located in the district, then for every single node, the probability of being located in the overhearing zone (being controlled by the adversary) is where S total is the size of the whole district wherein the wireless sensors are deployed, and S overhear is range that the adversary can control. Once a node is located in that range, it will be controlled by the adversary. Therefore, in the whole district, the number of nodes which are overheard N o satisfies the binomial distribution Given the average number of hops of each path and the number of paths, define N co as the number of communication nodes which be overheard by the attacker. Theorem 1. The probability of N co = m for all integers 0 ≤ m ≤ N l can be denoted as where p c = N l N is the probability of a node being involved in the communication.

Proof.
Theorem 2. Let E o denote the number of channels that being overheard by the attacker, and E vo denote the number of valid channels that being overheard, then for all integers 0 ≤ c ≤ kH And (27) can be rewritten as Let Γ w be the overhearing matrix, referring to as the matrix that consists of the coding vectors of the valid channels being overheard. Theorem 3. The attacker cannot get any useful information of the original messages given that R(Γ w ) < h, where R(Γ w ) is the rank of matrix Γ w , i.e., Γ w is not a full-rank matrix.
Proof. Let X = (x 1 , x 2 , · · · , x h ) T be the original message that is sent over the network. Then after the pre-coding on the source node, the input message can be written as X = [x 1 , x 2 , · · · , x h ], i.e., the source transmits X instead of X. Since a linear random network code is used ,the message on each channel e j can be written as Γ e j X . The message obtained by the attacker is W=Γ w X . Since R(Γ w ) < h, which means the adversary can obtain at most h − 1 linearly independent equations, which means that it cannot resolve for all the packets in X . Then the attacker cannot solve any packets through formula (16). Hence, we have I(x i ; B) = I(x i ; W) = 0 and by so we can achieve weak security.
Let p e be the probability of transmission being insecure, which means Then p s = 1 − p e is the probability of transmission being secure.
Proof. When E vo ≤ h − 1, it is obvious that the rank of overhearing matrix Γ w cannot be h since the When h ≤ m ≤ kH, the probability of R(C) = h under the condition of E vo = m is p(R(Γ w ) = h|E vo = m). In summary, p e = ∑ kH m=h p(E vo = m)p(R(Γ w ) = h|E vo = m), then Theorem 4 proved.
where q is the size of the finite field.
Proof. Let N m,h (h) denote the number of m − by − h matrices that have a rank of h(m ≥ h). Then form [18], we have Hence, we have This, we complete the proof of Lemma 1.

Theorem 5.
According to Lemma 1, we have Hence, the probability of transmission being secure can be written as Figure 6 shows the power consumption of different components in WSNs, which is proposed by Estrin [19]. It indicates that compared with the energy consumed by data transmitting and receiving, the energy consumed by other components, including sensing, computing and sleeping can be negligible. Meanwhile, the energy consumption of idling is always allocated by the nodes to avoid collisions and does not affect the energy analysis in network layer since avoiding collisions is a function in MAC layer [20]. Therefore, in this article, the total energy consumption of one transmission can be written as

Power Consumption Analysis
Specifically, in one transmission, the energy consumption of transmitting and receiving can be written as where B is the number of bits of data, E TXElec and E RXElec are the energy consumption of transmitting and receiving a bit of data respectively. In addition, E amp is the amplification factor of the amplifier, d is the distance between transmitting node and receiving node, γ is the path-loss factor. Therefore, when a source node needs to transmit a packet of B bits to a sink node through a path of k hops, the total energy consumption is In multipath routing scheme without network coding, to achieve a desired successful delivery ratio of R, the number of paths should be employed is Therefore, to transmit N p packets successfully from the source to the node, the number of packets that source node needs to transmit totally is Hence, the total energy consumption is where b is the number of bits per packet. On the other hand, in the network coding based multipath routing scheme, to achieve a desired successful delivery ratio of R with multicast capacity of h, the number of path needs to be employed H can be calculated by Algorithm 1. Then to transmit N p packets successfully from the source to the node, the number of packets that source node needs to transmit totally is Hence, the total energy consumption is

Simulations of Security
Basing on the analysis in Section 4, the simulations on the probability of transmission being secure are conducted with different network parameters, including h, e, k, p o , etc. Here the simulation results are presented in Figures 7-10. It can be concluded from Figure 7 that with the increase of multicast capacity h, the probability of transmission being secure p s increases slightly. However, With the increase of number of hops k, the probability of transmission being secure decreases rapidly. If the desired probability of transmission being secure is p S ≥ 0.99, the number of hops needs to be limited with k ≤ 4.
From Figure 8, it can be concluded that the attacking capability of adversary has a relatively significant impact on the probability of transmission being secure, and the larger the number of hops is, the greater the impact is.

Simulations of Energy Consumption
In Section 5, the analysis of energy consumption of the network coding based multipath routing scheme and the multipath routing scheme without network coding is conducted. Basing on that, simulations on energy consumption are performed and the results are presented in Figure 11. As Figure 11 shows, it is clear that the network coding based scheme has a better energy efficiency and the more packets transmitted, the more energy consumption can be reduced.

Conclusions
In this article, a weakly secure network coding based multipath routing scheme is proposed. Based on that, the analysis on security and power consumption is conducted. Accordingly, the simulations on the probability of transmission being secure and power consumption are performed and the comparison of power consumption between two different schemes is performed as well. According to the analysis and simulation results, some conclusions can be drawn and are listed as follows: 1. As the number of hops of each path in the network increases, the probability of transmission being secure decreases rapidly, especially under the condition of low communication capacity. For example, when the the capacity is h = 3 and p o = 0.1, with k being 2, 3, 4, and 5, the probability of transmission being secure is 0.9851, 0.7586, 0.2277, and 0.0020 correspondingly. Toward this end, if the desired probability of transmission being secure is p s ≥ 0.99, the number of hops should be limited with k ≤ 4. To do this, it is necessary to deploy nodes with larger communication distance,especially when the nodes are deployed in a rather vast area. 2. When the number of hops k ≥ 3, with the increase of multicast capacity h, the probability of transmission being secure increases and approaches 1 gradually. When k = 2, with the increase of multicast capacity h, the probability of transmission being secure almost keeps unchanged and satisfies p s ≈ 1. 3. When the number of hops k ≥ 3, the overhearing ability, which can be reflected by the figure of p o , has a relatively significant impact on the probability of transmission being secure. However, when k = 2, the probability of transmission being secure almost keeps unchanged and is approximately equal to 1. 4. Compared with the multipath routing scheme without network coding, the network coding based scheme in this article has a better energy efficiency. According to the simulation results, when 1000 packets are transmitted and the multicast capacity is h = 10, the power consumption of network coding based multipath routing scheme is 36.67% less than the scheme without network coding.