Achieving Network Level Privacy in Wireless Sensor Networks†

Full network level privacy has often been categorized into four sub-categories: Identity, Route, Location and Data privacy. Achieving full network level privacy is a critical and challenging problem due to the constraints imposed by the sensor nodes (e.g., energy, memory and computation power), sensor networks (e.g., mobility and topology) and QoS issues (e.g., packet reach-ability and timeliness). In this paper, we proposed two new identity, route and location privacy algorithms and data privacy mechanism that addresses this problem. The proposed solutions provide additional trustworthiness and reliability at modest cost of memory and energy. Also, we proved that our proposed solutions provide protection against various privacy disclosure attacks, such as eavesdropping and hop-by-hop trace back attacks.


Introduction
With the spreading application of Wireless Sensor Networks (WSNs) in various sensitive areas such as health-care, military, habitat monitoring, etc, the need to ensure security and privacy is becoming imperatively important. For example, in battlefield application scenario, "the location of a soldier should not be exposed if he initiates broadcast query" [1]. In the meantime, query must be transferred to the destination in an encrypted manner via only trusted en-route nodes. Similarly, in habitat monitoring application scenarios, such as Great Duck Island [2] or Save-the-panda application [3] where large numbers of sensor nodes are deployed to observe the vast habitat of ducks and pandas, an adversary can try to capture the panda or duck by back-tracing the routing path until it reaches the source sensor nodes. Therefore, in order to prevent the adversary from back-tracing, the route, location and data privacy mechanisms must be enforced. With respect to these application scenarios, network level privacy has often been categorized into four categories: 1. Sender node identity privacy: no intermediate node can get any information about who is sending the packets except the source, its immediate neighbors and the destination, 2. Sender node location privacy: no intermediate node can have any information about the location (in terms of physical distance or number of hops) about the sender node except the source, its immediate neighbors and the destination, 3. Route privacy: no node can predict the information about the complete path (from source to destination). Also, a mobile adversary gets no clue to trace back the source node either from the contents and/or directional information of the captured packet(s), and 4. Data packet privacy: no node can see the information inside in a payload of the data packet except the source and the destination.
Existing privacy schemes such as [1,[3][4][5][6][7], that have specifically been proposed for WSNs only provide partial network level privacy. Providing a full network level privacy is a critical and challenging issue due to the constraints imposed by the sensor nodes (e.g., energy, memory and computation power), sensor network (e.g., mobility and topology) and QoS issues (e.g., packet reach-ability and trustworthiness). Thus, an energy-efficient privacy solution is needed to address these issues.
In order to achieve this goal, we incorporate basic design features from related research fields such as geographic routing and cryptographic systems. To our knowledge, we propose the first full network level privacy solution for WSNs. Our contribution lies in following features.
• A new Identity, Route and Location (IRL) privacy algorithm is proposed that ensures the anonymity of source node's identity and location. It also assures that the packets will reach their destination by passing through only trusted intermediate nodes.
• A new reliable Identity, Route and Location (r-IRL) privacy algorithm is proposed, which is the extension of our proposed IRL algorithm. This algorithm has the ability to forward packets from multiple secure paths to increase the packet reach-ability.
• A new data privacy mechanism is proposed, which is unique in the sense that it provides data secrecy and packet authentication in the presence of identity anonymity.
Our solutions collectively provide protection against various privacy disclosure attacks such as eavesdropping and hop-by-hop trace-back attacks. Also, our solutions are lightweight, hence consume modest memory and energy.
The rest of this paper is organized as follows: Section 2. contains related work, Section 3. articulates the network model, assumptions and adversary model. Section 4. describes the proposed privacy schemes, Section 5. consists of analysis and evaluation, and Section 6. concludes the paper.
C. Ozturk et al. [3] proposed a phantom routing scheme for WSNs, which helps to prevent the location of a source from the attacker. In this scheme, each message reaches the destination in two phases: 1) a walking phase, in which the message is unicasted in a random fashion within first h walk hops, 2) after that, the message is flooded using the baseline flooding technique. The major advantage of their scheme is the source location privacy protection, which improves as the network size and intensity increase because of high path diversity. But on the other hand, if the network size increases, the flooding phase will consume more energy. This scheme does not provide identity privacy. Also, it is unable to provide data secrecy in the presence of identity privacy. P. Kamat et al. [4] proposed a phantom single-path routing scheme that works in a similar fashion as the original phantom routing scheme [3]. The major difference between these two schemes is that after the walking phase, a packet will be forwarded to the destination via a single path routing strategy such as the shortest path routing mechanism. This scheme consumes less energy and requires slightly higher memory as compared to first one. This scheme also does not provide identity privacy. Also, it is unable to provide data secrecy in the presence of identity privacy.
S. Misra and G. Xue [5] proposed two schemes: Simple Anonymity Scheme (SAS) and Cryptographic Anonymity Scheme (CAS) for establishing anonymity in clustered WSNs. The SAS scheme use dynamic pseudonyms instead of true identity during communications. Each sensor node needs to store a given range of pseudonyms that are non-contiguous. Therefore, the SAS scheme is not memory efficient. On the other hand, the CAS scheme uses keyed hash functions to generate pseudonyms. This scheme is memory efficient as compare to the SAS but it requires more computation power. The authors do not propose any routing scheme. Sender node may always send packets to the destination via shortest path. In that case, for an adversary who is capable of performing hop-by-hop trace back (with the help of direction information) can find out the location of the source node.
Y. Xi et al. [1] proposed a Greedy Random Walk (GROW) scheme to protect the location of the source node. This scheme works in two phases. In a first phase, the sink node will set up a path through random walk with a node as a receptor. Then the source node will forward the packets towards the receptor in a random walk manner. Once the packet reaches at the receptor, it will forward the packet to the sink node through the pre-established path. Here receptor is acting a central point between the sink and the source node for every communication session. A criterion of selecting a trustworthy receptor is essential, however not defined in the author's work.
Y. Ouyang et al. [7] proposed a Cyclic Entrapment Method (CEM) to minimize the chance of an adversary in finding out the location of the source node. In the CEM, when the message is sent by the source node to the base station, it will activate the predefined loop(s) along the path. An activation node will generate the fake message and forwarded it towards the loop, and original message is forwarded to the base station via specific routing protocol such as shortest path. Energy consumption in the CEM scheme is mainly dependent on the number of existing loops in the path and their size.

Geographic Routing Schemes
Our proposed privacy solutions incorporate the basic design features from geographic routing schemes [6,[8][9][10] that are discussed below.
M. Zorzi and R. R. Rao [8,9] proposed Geographic Random Forwarding (GeRaF) scheme for ad hoc and sensor networks. This scheme is based on broadcast transmission and the sender only requires the position of its own and the destination. All active neighborhood nodes who receive the packet will go through the contention phase. Once the contention phase is complete, the winner (the node that is closest to the destination) will relay the packet using the same mechanism. This process will repeat until the destination becomes one-hop away. The authors assumed that all nodes in the neighborhood do not remain active all the time. Due to the dynamics of the sleep modes, different sets of potential relays will be available. However, mostly the potential route is close to the same or shortest route, which makes easier for an adversary to trace back to the sender.
A. Capone et al. [10] proposed Simple Forwarding over Trajectory (SiFT) scheme. This scheme is based on broadcast transmission and does not maintain neighborhood positions and states. Each node who receives the packet will make the decision of forwarding that packet based only on its own position, the position of a transmitter and the trajectory. The difference between the GeRaF [8,9] and the SiFT scheme is that, the GeRaF does not use trajectories but the position of the destination. If nodes are static then similar to the GeRaF the potential route is close to the same or shortest route, which makes it easier for an adversary to trace back to the sender.
A. D. Wood et al. [6] have proposed a configurable secure routing protocol family called Secure Implicit Geographic Forwarding (SIGF) for WSNs. The SIGF is based on the Implicit Geographic Forwarding (IGF) protocol [11], in which a packet is forwarded to the node that lies within the region of 60 • sextant, centered on the direct line from the sender to the destination. The SIGF protocol provides some aspects of networks privacy such as data, route and location privacy, but it does not provide identity privacy. Another limitation of the SIGF protocol is that, when there is no trusted node within a forwarding area (assuming 60 • sextant), it will forward the packet to an un-trusted node. So, the reliability of the path is affected. Table 1 compares the proposed privacy preserving schemes. It clearly shows that none of the schemes currently provide full network level privacy. PFR [3] PSR [4] SAS & CAS [5] CEM [7] SIGF [6] GeRaF [8,9] SiFT [10] Required information for routing ID of destination Routing

Network Model
A wireless sensor network (WSN) is composed of large number of small sensor nodes that are of limited resource and densely deployed in an environment. Whenever end users require information about any event related to some object(s), they send a query to the sensor network via the base station. And the base station propagates that query to the entire network or to a specific region of the network. In response to that query, sensor nodes send back required information to the base station. A typical wireless sensor network scenario is shown in Figure 1. Links are bidirectional. Also, sensor nodes use IEEE 802.11 standard link layer protocol, which keeps packets in its cache until the sender receives an acknowledgment (ACK). Whenever a receiver (next hop) node successfully receives the packet it will send back an ACK packet to the sender. If the sender node does not receive an ACK packet during predefined threshold time, then the sender node will retransmit that packet.

Assumptions
For reason of scalability, it is assumed that no sensor node needs to know the global network topology, except that it must know the geographical location of its own, its neighboring nodes and the base station. In order to find out the location information, any proposed mechanism could be used, such as [12,13].
It is assumed that each sensor node in the network can share a unique secret key with the base station [14,15]. These keys are periodically updated. The public key of the base station is also assumed known to all the nodes in the network. Sensor nodes do not require their own public and private keys; because computation cost of public and private keys is generally consider being high. However, many researchers [16,17] have shown the feasibility of using public key cryptography in wireless sensor networks. It is also assumed that sensor nodes are capable of performing encryption and decryption of the data by using any cipher algorithm such as DES, AES etc. This provides an additional layer of security. This paper only focuses on the development of a prevention strategy against network level privacy disclosure attacks, such as eavesdropping, traffic analysis and hop-by-hop trace back attacks. Other general attacks, such as flooding attacks, could be detected and prevented by using any IDS scheme proposed for WSNS.

Adversary Model
We have assumed that an adversary can mostly perform passive attacks (like eavesdropping [18], and traffic analysis), since such attacks helps to conceal the adversary's presence in the network. Nevertheless, the adversary is also capable of performing some active attacks like fabrication and packet drop attacks. We also assumed that the adversary is both device-rich and resource-rich [4]. These characteristics are defined below.
• Device-rich: the adversary is equipped with devices like antenna and spectrum analyzers, so that the adversary can measure the angle of arrival of the packet and received signal strength. These devices will help the adversary to find out the immediate sender of the packet and move to that node. This kind of hop-by-hop trace back mechanism will be carried out by the adversary until the actual sender node is reached.
• Resource-rich: the adversary has no resource constraint in computation power, memory or energy.
It is also assumed that the adversary has some basic domain knowledge like the range of identities assigned to the sensor nodes, the public key of the base station and information about the cipher algorithms used in the network. However, adversary has no knowledge which identity is physically associated with which node.
A detection and prevention strategy against non-privacy disclosure attacks at various layers such as jamming attacks is out of the scope of this paper. However, trust management methodology (Section 4.1) that we adopted in this paper is useful to detect and prevent some non-privacy disclosure threats such as, black hole attack, sink hole attack, and selective forwarding or gray hole attack.

Concepts and Definitions
In our proposed algorithms, we have used two notions: direction and trust. Both these notions (direction and trust) are used to provide reliable (non-malicious and non-faulty) secure paths for achieving robust route privacy. Direction information will help to forward packet to the destination in a timely manner and trust will help to forward the packets via reliable nodes. Detail definitions of both notions are given below.
Direction: The first notion used in our algorithms is that of direction. The physical location of the base station is the reference point for each sensor node. Based on this reference point, each node classifies its neighboring nodes into four categories: (1) forward neighboring nodes (F ), (2) right side backward neighboring nodes (B r ), (3) left side backward neighboring nodes (B l ), and (4) middle backward neighboring nodes (B m ). The objective of this categorization is to provide more path diversity as discussed in Section 4.2. A node x classifies its neighboring node y in following fashion: where θ is the angle between the node x and its neighboring node y with respect to the line joining node x and the base station as shown in Figure 2. Trust: The second notion used in our algorithms is that of trust. The definition of a trust here is based on our other paper [19] and restated here.
A node can be classified into one of the three categories [20]: trustworthy, untrustworthy, and uncertain. A node is considered trustworthy if it interacts successfully most of the time with the other nodes. A node is considered untrustworthy if it tries to do as many unsuccessful interactions as possible with the other nodes. An untrustworthy node could be a faulty [21] or malicious node. A node is considered uncertain if it performs both successful and unsuccessful interactions. Detailed definition of the successful and unsuccessful interactions and trust calculation methodology is available in our paper [22] and provided here in a simplified form.
A sender will consider an interaction successful if the sender receives confirmation that the packet is successfully received by the neighbor node and forwarded towards the destination in an unaltered fashion. The first requirement of successful reception is achieved on the reception of the link layer acknowledgment (ACK). The second requirement of forwarding towards the destination is achieved with the help of enhanced passive acknowledgment (PACK) by overhearing the transmission of a next hop on the route, since they are within the radio range [23]. If the sender node does not overhear the retransmission of the packet within a threshold time from its neighboring node or if the overheard packet is found to be illegally fabricated (by comparing the payload that is attached to the packet), then the sender node will consider that interaction as unsuccessful.
With the help of this simple approach, several attacks can be prevented, i.e., the black hole attack is straightforwardly detected when malicious node drops the incoming packets and keeps sending self generated packets [24]. Similarly, sink hole attack [25], an advanced version of the black hole attack, is also easily detectable by looking at the passive acknowledgment. Likewise, the selective forwarding attack [26] and gray-hole attack [27] can also be eliminated with the aid of above mentioned approach.
Based on these successful and unsuccessful interactions node x can calculate the trust value of node y in following fashion: (2) where [.] is the nearest integer function, S x,y is the total number of successful interactions of node x with y during time δt, and U x,y is the total number of unsuccessful interactions of node x with y during time δt. After calculating trust value, a node will quantize trust into three states as follows: where, f represents half of the average values of all trustworthy nodes and g represents one-third of the average values of all untrustworthy nodes. Both f and g are calculated as follows: where [.] is the nearest integer function, R x represents the set of trustworthy nodes for node x, M x the set of untrustworthy nodes for node x, and n is the total number of nodes that contains trustworthy, untrustworthy and uncertain nodes. The initial trust values of all nodes are 50, which represents the uncertain state. Initially f and g are equal to 25 and 17 respectively, although other values could also be used by keeping the following constraint intact: f i − g i ≥ 1, which is necessary for keeping the uncertain zone between a trusted and untrustworthy zone. The values of f and g are adaptive. During the steady-state operation, these values can change with every passing unit of time which creates dynamic trust boundaries. At any stage, when |R x | or |M x | becomes zero, the value of f j+1 or g j+1 remains the same as the previous values (f j and g j ). The nodes whose values are above 100 − f will be declared as trustworthy nodes (Equation 3), and nodes whose values are lower than 50 − g will be consider as untrustworthy nodes (Equation 3). After each passage of time, ∆t, nodes will recalculate the values of f and g. This trust calculation procedure will continue in this fashion.
The time window length (∆t) could be made shorter or longer based on the network analysis scenarios. If ∆t is too short, then the calculated trust value may not reflect the reliable behavior. On the other hand, if it is too long, then it will consume too much memory to store the interaction record at the sensor node. Therefore, various parameters can be used to adjust the length of ∆t.

Identity, Route, and Location Privacy (IRL)
Our proposed identity, route and location privacy scheme works in two phases. The first is neighbor node state initialization phase, and the second is routing phase.
Route Privacy: In initialization phase, let the node i have m neighboring nodes in which t nodes are trusted. So, , and M (t Bm ) represent the set of trusted nodes that are in the forward, right backward, left backward, and middle backward directions, respectively. These neighbor sets (M (t F ), M (t Br ), M (t B l ), and M (t Bm )) are initialized and updated whenever a change occur in neighborhood. For example, the entrance of a new node, change of a trust value, etc.
Whenever a node needs to forward a packet, the routing phase (Algorithm 1 for source node and Algorithm 2 for intermediate node) of IRL algorithm is called.
Whenever a source node (Algorithm 1) wants to forwards the packet, it will first check the availability of the trusted neighboring nodes in its forward direction set M (t F ) (Line 2). If trusted nodes exists then it will randomly select one node as a next hop (Line 3) from the set M (t F ) and forward the packet towards it (Lines 13:21). If there is no trusted node in its forward direction, then the source node will check the availability of a trusted node in the right (M (t Br )) and left (M (t B l )) backward sets. If the trusted nodes are available then the source node will randomly select one node as a next hop (Line 3) from these sets and forward the packet towards it (Lines 13:21). If the trusted node does not exist in these sets either, then the source node will randomly select (Line 8) one trusted node from the backward middle set (M (t Bm )) and forward the packet towards it (Lines 13:21). If there are no trusted nodes available in all of the sets then the packet will be dropped (Line 9:10). When an intermediate node (Algorithm 2) receives the packet (either from the source node or from another en-route node), it will first check whether the packet is new or old (Line 3). If it is new, then the node will first check the availability of the trusted node from the forward direction set (M F ) excluding the prev hop node if it belongs to forward set (Line 13). If trusted nodes exists in the forward set then the node will randomly select any one trusted node as a next hop (Line 14) and forward the packet towards it (Line 45). If there is no trusted node available in the forward direction, then it will check to which set the sender of the packet belongs to. For example, If the packet, forwarded by a node, belongs to the right backward set (Line 16), then it will first check whether the left or middle backward sets contain any trusted nodes (Lines 17:18). If so, it will randomly select one node from those sets (Line 19) and forward the packet towards it (Line 45). If there is no trusted node in those two sets, then the node will randomly select a trusted node from the right backward set (M (t Br )) excluding the one from which the node received the current packet (Lines 20:21) and forward the packet towards it (Line 45). Similar operations will be performed, if the packet, forwarded by a node, belongs to the left (Lines 25:33) and middle backward or forward (Lines 34:43) sets. An example IRL routing scenario is shown in Figure 3.
Algorithm 2 IRL -Routing at Intermediate Node. Set counter = timesReceviedBef ore + 1; 7: Remove signature from buffer; 8: if counter = 3 then 9: Drop packet and exit; 10: end if 11: end if 12: if packet came from B r then 17: if M temp2 ̸ = ∅ then 28:   This routing strategy may result in the creation of a cycle (loop). However, due to the randomness in the selection of the next-hop and the presence of the different four direction sets, the probability of creation of any cycle is very low. Nevertheless, in order to fully avoid the occurrence of the cycles, each node (prior to forwarding of a packet) will save the signature of the packet in the buffer for the δt time, that is: where D is the distance between the forwarding node and the base station, d is the distance between the forwarding node and the next hop, and p t is the propagation transfer time between the forwarding node and the next hop. This signature consists of two fields: (1) sequence number of the packet, and (2) the payload. The potential of the signature to compare and identify the same packet is detailed in the later section. Corresponding to this signature, three more fields are also stored in the buffer: (1) previous hop identity, (2) next hop identity where the packet is forwarded, and (3) counter, that tells how many times the same packet is received by the node. This information will later be used to get rid of any cycle. The size of the buffer is mainly dependent on the network traffic conditions. However, it is expected to be low due because the sensor nodes sent data either in periodic intervals or upon the occurrence of some event. If the node received the packet whose signature exists in the buffer (Algorithm 2, Lines 3:11), then including the previous hop node (Line 12), two other nodes will also be excluded from the selection of the next hop process: 1) the node from which last time the packet was received (Line 4) and 2) the node from which last time the packet was forwarded (line 5). If the same packet is received three times by the same node (Line 8) then the packet will be dropped (Line 9). Three sample scenarios of the loop creation, detection and prevention are shown in Figure 4. Creation of loops and traversing of the packets in the backward direction is not a completely negative effect. Rather, it provides positive effects in terms of strengthening the route and source location privacy, because these effects will helps to increase the safety period [3], which is the time for an adversary to reach at the source node.
Identity Privacy: Whenever a node receives the packet p from the source node or en-route node then the receiving node will replace the previous hop's identity prev hop contained in the packet with its own (Algorithm 1, Line 13). After that, the node will get the next forwarding node next hop (as described earlier) and update the header of the packet p = {prev hop , next hop , payload} (Line 14). After modification of the two header fields, the node will forward the packet (Line 16). In this way, all the intermediate forwarding nodes replace the source and next hop's identity contained in the packet p. This process will go on until the packet reaches the base station.
Location Privacy: The neighboring nodes which are in each other's radio range can easily approximate the location of each other by measuring the received signal strength and the angle of arrival [28]. If the adversary is within the range of the source node, then adversary can easily estimate the location of the source. Once the packet has crossed the radio range of the original source node, then becomes very difficult for an attacker to estimate the location of the node either in terms of the physical distance or in terms of the number of hops of an original source node. The main reason for this is that the path selection is random and packets are forwarded by only trusted nodes which only contain the information of the last and the next hop.

Reliable Identity, Route, and Location Privacy (r-IRL)
It is also possible that some applications require more reliability in terms of packet reach-ability; and the packet could be dropped due to either network congestion or malicious behavior of an en-route node. Thus, in order to achieve more reliability, the packet should be forwarded from multiple paths simultaneously, which will give trustworthiness in the sense that at least the packet should reach the base station by any one of the paths, although, this may increase some communication overhead.
Our reliable IRL (r-IRL) algorithm is the extended version of our proposed IRL algorithm, in which we introduce one more parameter, reliability r. The source node i will multi-cast a packet to all r randomly selected neighboring trusted nodes that are in the forward direction. If there are no adequate trusted nodes present in the forward direction, then it will select the remaining trusted nodes from the backward direction. The rest of the mechanism of the r-IRL algorithm is the same as the IRL algorithm.

Data Privacy
The payload contains the identity of the source node (ID x ) and the actual data (d). Identity is encrypted with the public key (k + bs ) of the base station and data is encrypted with the secret key (k x,bs ) shared between the sender node and the BS. Both are appended with the payload as shown below: If we assume that the adversary knows the range of identities assigned to the sensor nodes, public key of the base station and information about cipher algorithm used in the network, an adversary can then successfully obtain the identity of the source by performing simple brute-force search attack [29] by comparing the pattern of encrypted identity with a known range of identities. Therefore in order to provide protection against brute-force search attack, we append a random number (R n ) (equivalent to the size of identity) with the identity of a node and then perform encryption. Now the payload is: where || is the append operation. Inclusion of random number may introduce additional computational overhead. However, the amount of overhead is mainly dependent on random number generation technique. Recently, very nice random generation techniques have been specially designed for low power sensor networks, such as [30,31]. These techniques could be used to generate random number for each packet. Also, overall computational overhead is dependent on the number of packets generated by the sensor nodes. Mostly, sensor nodes are event driven or query driven [32]. Therefore, amount of traffic is usually kept low as compared to traditional networks. Our proposed data privacy approach provides several benefits. Firstly, data secrecy is achieved in the presence of identity anonymity. This feature is not available in earlier proposed privacy schemes. Secondly, the base station will receive both the identity of the actual source node and message authentication. If the packet has been successfully decrypted with the shared secret key, it means that packet is received from genuine sensor node.

Security Resiliency Analysis
Suppose we have an adversary A who strives to defeat our privacy protocols and guess the original source node. We will distinguish between two kinds of nodes. A source node is the original sender of a packet q and a forwarding node is the node that forwards a packet to another node until it reaches the destination. Hence the source node is also a forwarding node. The adversary's goal is to find out the source node. This analysis assumes that we are using IRL algorithm including our proposed data privacy mechanism. So if the adversary sees a packet, it will trivially know the identity of the last forwarding node (which could possibly be the sender node).
We will deal with separate cases. Case 1 is when the adversary is close to the base station and can eavesdrop on any packet received by the base station. Case 2 deals with the case when the adversary can see any packet within the radio range of a particular node. Case 3 extends this into two or more nodes.
An adversary will try to solve the following problem: Given a packet q and a subset of nodes N ′ , find out the sender node s. In other words, the algorithm for the adversary takes two inputs and outputs a node s ′ ; Namely A(q, N ′ ) = s ′ . If s ′ = s, the adversary succeeds in defeating our protocol. We have to find: Pr[A(q, N ′ ) = s], which is the probability for an adversary to find out the sender node. Our assumption is that, from an adversarial perspective, all nodes are equally likely to be senders of a packet. This does not necessarily mean that the network traffic is uniformly distributed. Notice that if the adversary knows beforehand which nodes are more likely to send packets, then no privacy preserving method can prevent the adversary from guessing the most likely senders, since this constitutes the adversary's a priori knowledge.
Notations and definitions: Denote a generic node by m. The set of neighbors of m is denoted by N m , which also includes m itself. The number of forward and backward nodes of m is denoted by m f and m b respectively. If a node a is a backward node of m, then we denote it as a → m. We say that a node a is in the backward set of node m, if a → a 1 → . . . a r → m, for some nodes a 1 , . . . a r where r ≥ 0. For compact notation we will denote this as a → r m, if the IDs of the intermediate nodes are not significant. We will also use the notation → r m to denote a generic node, who is r links (hops) away from m. Define the backward set C m of m as C m = {a|a → r m, r ≥ 0}, that is the set of all the possible nodes such that they have a forward link to m. Denote the base station as B. It will also be seen as another node. Let the total number of nodes in the network excluding the base station be N . We will use the term "adversary is in possession of a node" to indicate that the adversary can passively listen to any communication within the radio range of that node.
Claim 1: Suppose A is in possession of B. Let B b be the number of backward nodes of the base station (nodes one hop away from the base station). Then for any packet q received by B and for large enough N : Proof : The adversary can always know the ID of the last forwarding node. Let B b be the number of backward nodes to the base station. The packet could only have come from one of the nodes in N B −{B} (which only contains backward nodes to B). Since the nodes are just a hop away from the BS, so they will not send the packet to another node. Hence for large N we have: Now let us assume that A is in possession of a node m in the network. The following probability estimate gives an upper bound of the probability of success of the adversary. It is an upper bound since it does not include the possibility of a packet sent backwards. When a packet is sent backwards over one or many hops, the probability of success of the adversary decreases since there would be more possible nodes. Thus in this scenario our result would be like an upper bound on the adversary's limitations.
Claim 2: Suppose A is in possession of a node m. Let c = |C → 2 m | denote the number of backward nodes in backward set C → 2 m of some node → 2 m. Then, Proof : Since the adversary is in possession of a node m, it knows its backward and forward nodes. Furthermore, if any of these nodes including the node m itself is the sender of a packet q, then the adversary will know. This is true since the adversary can see all incoming packets to the node m and to its neighbor nodes (the forward and the backward nodes). Thus it can see if the payload of q is not equal to the payload of any q ′ being received by these nodes in a given interval of time. If this is the case, then the adversary will know the sender. Now if none of the nodes in N m are the senders, then the packet was forwarded by a node i that is two hops away from m. The adversary knows the ID of that node through the packet q. Thus the adversary makes a list of all the possible backward nodes in the backward set of i. Let that number be denoted by c. Notice that node i could also be the possible sender. Hence the total number of possible senders would be c + 1. We have: Now, suppose the adversary is in possession of two nodes at the same time; m 1 and m 2 . We can safely assume that N m 1 ∩ N m 2 = φ, since it would be more advantageous to the adversary to cover nodes with non-overlapping radio ranges. The adversary will always know whenever any node in N m 1 or N m 2 is the sender of a packet. How about the case when they are not the senders? There could be two possible cases: without loss of generality, first assume that m 2 ∈ C m 1 . If the packet q was received by some node in N m 1 and was received by some node in N m 2 before, then the adversary had already checked it when the packet was sent to a node in N m 1 . Thus the adversary need only check packets received in N m 1 that were not received by N m 2 . In this case, the sender cannot be in N m 2 . In any case, the adversary has to find out the backward sets of → 2 m 1 or → 2 m 2 , depending on where the packet was received. Since, in the adversary's knowledge, all nodes are equally likely to be senders, the probability of a packet being received at the two sets is the same. In case m 2 / ∈ C m 1 , then the adversary has no real advantage except that it can see packets at two disjoint locations in the network. Thus we only state the case when m 2 ∈ C m 1 . We have the following result: Claim 3: Suppose the adversary is in possession of two nodes m 1 and m 2 . Assume further that m 2 ∈ C m 1 . Let c 1 = |C → 2 m 1 | and c 2 = |C → 2 m 2 | then: In general, we have: Claim 4: Let us assume that A is in possession of k nodes m k → r 1 · · · → r k−2 m 2 → r k−1 m 1 and let m f and m b denote the average number of forward and backward nodes averaged over all the k nodes. Let t = m f + m b + 1. Let for 1 ≤ i ≤ k, c i = |C → 2 m i |, then: Observations: The probability is lowest when the adversary is actually at the base station. If the adversary has more nodes in possession, the probability increases linearly, with more success rate when the nodes are actually connected. This also shows that if a packet originates from any node that does not have a backward node, the adversary will always know the sender. This drawback can be avoided by requiring all nodes to have backward nodes. In other words, avoid a tree topology.
The above security resiliency analysis description is for route and location privacy. The security strength of identity and data privacy is mainly dependent on the encryption schemes. If encryption scheme is strong then we can achieve stronger identity and data privacy. If encryption scheme is weak then we have weak identity and data privacy.

Memory Consumption Analysis
Each sensor node needs to maintain one table that contains the list of neighboring nodes, their direction and their trust states as shown in Table 2. Node identity can be represent in two bytes [15,33]. Four sets of directions can be easily represented in 2 bits. Trust calculation is based on time-based past interaction only. Therefore, the total size required to calculated trust value is 4∆t bytes [22]. Here, ∆t represents size of time window and 4 bytes are required to store number of successful (2 bytes) and unsuccessful (2 bytes) interactions. Trust value can be represented in one byte. Therefore the size of each record is 3.25 + 4∆t bytes (26 + 32∆t bits). If we assume that the node has M neighboring nodes then the total size of the table will be M (26 + 32∆t) bits. In order to achieve data privacy in the presence of identity anonymity, our proposed scheme uses two keys: one Public key of the base station k + bs and other is shared secret key k x−bs . Therefore, total memory required at the sensor node for our proposed scheme is: M (26 + 32∆t) + k + bs + k x,bs . Table 3 shows the memory requirement of various privacy schemes, in which M represents the neighborhood size, K represents pseudonym space, 4∆t represents size of time window, and N is the total number of nodes in the network. In the Phantom Flood Routing (PFR) [3] scheme, each sensor node needs to maintain the list of neighbor nodes and these neighbor nodes are divided into two sets. Here we assume that identity of a node is represented by two bytes, and set is distinguished by a single bit. So the total memory required by each node in the PFR scheme is (16+1)M bits. In the Phantom Single-path Routing (PSR) [4] scheme, each node maintain the list of neighbor nodes, hop count (2 bytes), and set identification (1 bit). Therefore, the total memory required by each node in the PSR scheme is (16+16+1)M bits. In the SAS scheme, each node needs K(4M +2N )+16M bits of memory. Here M represents the neighborhood size, K represents pseudonym space and N is the total number of nodes in the network. For the CAS scheme, each node requires K(6+7M )+16M bits of memory. (See [5] for more details about the SAS and CAS schemes.) Let us assume that the sensor node has ten neighbor nodes, then the total memory required by the sensor node by the PFR, PSR, IRL, CAS and SAS is 21.25, 41.25, 260.5, 628 and 1940 bytes respectively, as shown in Figure 5. Additionally, cycle prevention strategy (Section 4.2) requires some short term memory to store signature of the packet for short period of time (δt). In our proposed schemes, signature of the packet comprises of six fields: 1) Sequence number (2 bytes), 2) previous hop identity (2 bytes), 3) next hop identity (2 bytes), 4) payload (variable size), 5) counter (2 bits), and 6) δt time (4 bytes). So, for each packet sensor node requires 10.25 + Size(payload) bytes of memory. The packet signature will be removed from the buffer after δt time. For example, sensor node x received 20 packets of equal size of payload (e.g., 10 bytes). Then, the total memory required by the sensor node is 20 × (10.25 + 10) = 405 bytes. This additional overhead does not make sensor nodes overloaded because of following reasons: 1. Generally, wireless sensor networks are event driven [34][35][36] or sensor nodes generate packets in periodic intervals [37,38]. Therefore, the amount of overall traffic usually remains low. 2. In our proposed schemes, packets always follow different routes. Therefore, the probability of a single node to be overloaded is very low. Let us assume that the amount of traffic is very high and single sensor node needs to store large amount of packets at a time. Then in order to reduce the size of memory, we can use the technique of signature generation code [39]. This technique allows us to represent single signature code in few bytes. However, this technique is based on Bloom filters [40] that require the computation of multiple hash values. This may increase the computational cost. Therefore, we need to trade off between memory and computation cost.

Energy Consumption Analysis
In this section, we will show the efficiency of our routing strategies with existing schemes. Energy is computed based on the communication overhead (including transmission and reception cost, path length) introduced by our proposed routing protocols and compared it with other existing schemes. We have implemented our IRL and r-IRL routing schemes on Sensor Network Simulator and Emulator (SENSE) [41]. At the application layer we used constant bit rate component (CBR) that generate constant traffic during simulation between randomly selected source node(s) and the base station. For the simplicity, we assumed that both sensor nodes and the base station are static. Network consists of 300 sensor nodes that are organized into 15 by 20 grid manner. Other simulation parameters are given in Table 4.
We have compared our proposed IRL and r-IRL algorithms with the four variations of phantom routing schemes [3,4] that are: 1. Phantom single path routing scheme with hop-based approach (PSR-hop).
We did not compared our schemes with the SAS and CAS [5] schemes because the authors did not propose any routing strategy.  The energy consumption analysis with different scenarios are shown in Figure 6. For the r-IRL scheme we select r = 3, which means a single packet will reach the destination via three different routes simultaneously. For phantom routing schemes, we select parameter h walk =10 (as recommended in [3]). Figure 6 clearly indicates that, the IRL and r-IRL schemes consume less energy as compared to the PSR-sec, PFR-hop and PFR-sec schemes but slightly consume higher energy as compared to the PSR-hop scheme. This is due to the fact that the IRL and r-IRL algorithms provides more path diversity and packets sometimes took longer paths.

Path Diversity Analysis
Strength of route privacy is dependent on path diversity. High path diversity provides strong route privacy and low path diversity provides weak rout privacy. Path diversity can be categorized into two types.
1. Length variation: Path could be long or short and mainly dependent on routing scheme. For example, packets always reach to the destination via shortest path. In this scheme, packets may reach to the destination via longer path if any node is not working properly within the shortest available path. With respect to the route privacy, length variation provides minimum route privacy. If we have longer paths, then it will increase time for an adversary to find out actual source node or vice versa. So, the longer path increases safety time.
2. Path variation: Each packet may follow different route. It is also dependent on routing strategy. For example, routing scheme make decision about next hop based on the energy level of neighboring nodes. With this approach, one can achieve limited path variation. With respect to the route privacy, if we have more path variation, then it will become clueless for an adversary to guess from where next packet will come.
Our proposed routing strategies (IRL and r-IRL) have both features. Because of the concept of direction (Section 4.1), proposed schemes provide more length variation and because of the randomness (Section 4.2) proposed schemes provide high path variation. Incorporation of both features offer high path diversity. In order to analyze the path diversity behavior, we have organized 300 sensor nodes in a 10 by 30 grid manner. The rest of simulation parameters are given in Table 4. In the simulation, a single source node (ID: 224) generates 100 data packets for the base station. Figure 7 shows the path diversity (in terms of path length) of the IRL, PSR-hop and PSR-sec schemes. The average path taken by the PSR-hop, IRL and PSR-sec is 22.12, 36.81 and 38.17, respectively. It indicates that the IRL scheme incurs more delay as compared with the PSR-hop scheme and less delay as compared with the PSR-sec scheme. This figure also indicates that the IRL scheme has more path variation as compared with the other schemes, which creates more difficulties for the adversary to trace back the source from the captured packets. Figure 7 also shows that some packets took longer paths in the IRL scheme as compared with others. This is due to the fact that the source or en-route node did not find any trusted node in its forward direction, so the packet is relayed back in the backward direction. If we assume that each node has p probability to be trusted and all probabilities are independent of each other, then the total probability P b for a node i to relay the packet in the backward direction is: where m f represents the number of nodes in the forward direction. Figure 8 shows the result of 100 simulation runs in which we have assumed that each node has equal probability to be trusted and un-trusted. It shows that, as the neighborhood size increases, the probability of the packet to move in the backward direction decreases sharply.

Discussion
From the memory, energy and path diversity analysis, we see that our solution is optimal especially with respect to the PSR-hop scheme. However, at a modest cost of memory and energy, our solutions provide full network level privacy as compared with the other existing schemes. This cost is justifiable because we have additionally achieved trustworthiness and reliability (in terms of packet reach-ability). With this level of resource consumption, our solutions can easily be used on real sensor nodes, for example, MICA2 sensor node has ATMega 128L micro controller (8 MHz @ 8 MIPS), 128 Kbyte program flash memory, 512 Kbyte measurement (serial) flash, and 4 Kbyte EEPROM [42].

Conclusions and Future work
Existing privacy schemes of WSNs only provides partial network level privacy. Providing full network level privacy is a critical and challenging issue due to the constraints imposed by the sensor nodes (e.g., energy, memory and computation power), sensor network (e.g., mobility and topology) and QoS issues (e.g., packet reach-ability and timeliness). Therefore, in this paper we proposed the first full network level privacy solution that is composed of two new identity, route and location privacy algorithms and data privacy mechanism. Our solutions provide additional trustworthiness and reliability at modest cost of energy and memory. We also proved analytically that our solutions provides protection against an adversary who is capable of performing privacy disclosure attacks such as eavesdropping and hop-by-hop trace backing.
In our future work, we will evaluate our proposed schemes from the perspective of computation cost that is required to perform encryption and random number generation.