Acquiring Authentic Data in Unattended Wireless Sensor Networks

An Unattended Wireless Sensor Network (UWSN) can be used in many applications to collect valuable data. Nevertheless, due to the unattended nature, the sensors could be compromised and the sensor readings would be maliciously altered so that the sink accepts the falsified sensor readings. Unfortunately, few attentions have been given to this authentication problem. Moreover, existing methods suffer from different kinds of DoS attacks such as Path-Based DoS (PDoS) and False Endorsement-based DoS (FEDoS) attacks. In this paper, a scheme, called AAD, is proposed to Acquire Authentic Data in UWSNs. We exploit the collaboration among sensors to address the authentication problem. With the proper design of the collaboration mechanism, AAD has superior resilience against sensor compromises, PDoS attack, and FEDoS attack. In addition, compared with prior works, AAD also has relatively low energy consumption. In particular, according to our simulation, in a network with 1,000 sensors, the energy consumed by AAD is lower than 30% of that consumed by the existing method, ExCo. The analysis and simulation are also conducted to demonstrate the superiority of the proposed AAD scheme over the existing methods.


Introduction
The use of wireless sensor networks (WSNs) on data gathering applications has been popularized in recent years. Since WSNs could be deployed in the hostile environments, one of the fundamental issues is how to secure the collected data. Unfortunately, with the consideration of the sensors' scarce resources, the security issue becomes very challenging because conventional computationally-intensive cryptographic primitives cannot be utilized.
WSNs considered in the literature are usually assumed to have a constant connection between sensors and a trusted data collection unit, e.g., the sink. From the security point of view, in such a scenario, sensors can collect and then report the sensitive data to the sink at will. With the cryptographic primitives such as encryption and authentication, the confidentiality and authenticity of the transmitted sensed data can be guaranteed. More importantly, this avoid storing a large amount of data in sensors that are easily to be compromised. Even more, with the aid of the always-present sink, the network can defend against the attacks such as sensor compromises more efficiently and effectively. Nevertheless, due to the application restrictions, the above scenario is not always the case. In real world applications, there could be the cases where after the sensor deployment, the sensed data should be temporarily stored in the sensors because the sink is away from the network in most of the sensor network lifetime. Only at the end of each collection interval, will the itinerant sink roams around the sensing region and collects the data sensed by sensors. In contrast to the usual sensor networks, to emphasize the unattended WSN feature, this type of WSN consisting of sensors and an itinerant sink that periodically collects sensed data is termed as unattended wireless sensor networks (UWSNs). In fact, UWSNs have been used in practical WSN applications [1,2]. In particular, the unattended sensor networks in [1] operate in an unmanned manner. For example, the nuclear emission sensor network could be deployed to monitor potential nuclear activity. In addition, another example is to deploy unattended sensors to detect underground sound and vibration, in order to be aware of troop movements, border crossings, and enemy's aircrafts as soon as possible. Trident systems [2] deploy the unattended ground sensors for providing reliable communication links. It is often used for transmitting timely message back to command and control centers. These sensors can be used in battlefield applications including perimeter defense, border patrol and surveillance, target acquisition, and situation awareness. The conceptional illustration is shown in Figure 1.
Due to its inability to offload the sensed data in a real-time manner, sensors should keep the data sensed in the local memory within the collection interval between successive sink visits, incentivizing various attacks. The adversary may have different goals; it may be interested in learning the data sensed by a specific set of sensors, or want to prevent certain data from reaching the data sink. In this paper, we consider the adversary whose goal is to alter the sensors' data so that the falsified data can mislead the sink. In spite of the paramount importance of this authentication problem, only few solutions [3,4] are proposed. Specifically, in [3], a novel authentication function was proposed to deal with the authentication problem in a storage-efficient manner. Nevertheless, it is effective only against the reactive adversary (described in Section 3.2) that is relatively weak and is easily to be overcame. In [4], two collaborative authentication schemes, CoMAC and ExCo, were proposed to defend against the stronger adversary, proactive adversary (described in Section 3.2). Unfortunately, the simple collaboration among the sensors in CoMAC and ExCo incurs more attacks such as Path-Based DoS (PDoS) [5] and False-Endorsement DoS (FEDoS) [6] attacks (described in Section 4.1). In addition, the resilience of ExCo against sensor compromises is not as strong as [4] claims (described in Section 4.1). Furthermore, due to the lack of the proper use of sensors' position information, CoMAC and ExCo have relatively high energy consumption especially in a large scale network. The above reasons motivate us to develop a secure and efficient collaborative authentication scheme in UWSNs.

Contribution
We identify the security flaws of the schemes in [4]. Aiming at solving the identified problems, a scheme, called AAD, is proposed to Acquire Authentic Data in UWSNs. AAD possesses three characteristics. (1) Due to the proper use of sensors' position information, it is communication-efficient.
(2) In addition to acquiring authentic data, AAD is also resilient against both Path-Based DoS (PDoS) [5] and False-Endorsement DoS (FEDoS) [6] attacks. (3) The resilience of AAD against sensor compromises is superior to that in prior works [4]. From analytical and simulation results, the robustness of AAD is demonstrated to be superior to those of CoMAC and ExCo.

Related Work
Due to the use of Bloom filter in our proposed AAD scheme, its brief introduction is given in Section 2.1 Then, some related works performed on UWSNs are briefly described in Section 2.2

Bloom Filter
As a kind of probabilistic data structure, a Bloom filter consists of an array of n bits. Together with k independently and randomly selected hash functions, h 1 , · · · , h k , with range [0, n − 1], it is used to represent a set of elements with the support of membership query. Assume that a Bloom filter B is used to represent a set S = {s 1 , . . . , s m } of m elements. To insert an element s i , the bits B[h j (s i )] for 1 ≤ j ≤ k are set to 1. Note that the bit remains unchanged when being already set to 1. To check whether an element x is in the set S, we can check whether the bits B[h j (x)] for 1 ≤ j ≤ k are all 1's. If and only if they are all equal to 1, x is deemed to be an element of S. The size n of Bloom filter is independent of the size of elements and can be constant, which is very memory-efficient. Nevertheless, the membership query on Bloom filter has false positive but has no false negative. In other words, it is probable to falsely consider an element that actually does not belong to S as an element of S. In [7], such false positive probability can be obtained as The optimization between the performance efficiency e.g., array length or hash functions required) of Bloom filter and the false positive probability can be obtained but is beyond the scope of this paper. Note that one of the characteristics of Bloom filter we use in the design of our proposed scheme is that the query result is always correct if the content to be queried is indeed stored in the Bloom filter.

Security Issues in UWSNs
Although UWSNs are studied only recently, many security issues have been investigated. The data survival issue, which aims to prevent the sensed data from being arbitrarily erased by the adversary, is first studied in [1,8]. Due to the fact that the adversary may compromise some sensors in order to enhance its capability of subverting the network functionality, the self-healing techniques are proposed in [9,10] to recover the compromised sensors. We refer the readers to [11] for a comprehensive overview of security issues in UWSNs.
As for the authentication problem in UWSNs, few research efforts are conducted. Since the data sensed within the collection interval should be stored in the sensor for a possibly considerable period of time, in order to provide forward-security (described in Section 3.2), some authentication schemes [3,12] were proposed. Moreover, in [4], two authentication schemes, CoMAC and ExCo, both of which rely on the collaboration among multiple sensors, are presented for collecting authentic sensed data. To our knowledge, CoMAC and ExCo are the only schemes having both forward and backward security (described in Section 3.2). Particularly, in CoMAC, each sensor j, after obtaining its sensed data d r j at round r, constructs the corresponding authentication tag z r j . Then, it immediately sends ⟨d r j , z r j ⟩ to a set of randomly selected sensors. To verify the authenticity of d r j , the sink collects all the corresponding authentication tags and checks whether the collected authentication tags can be generated by using the claimed d r j . ExCo is different from CoMAC in that the former stores the hash of all the received authentication tags for each sensor while the latter directly keeps the received authentication tags for each sensor.
Indeed, when CoMAC or ExCo is used, forward-security and backward-security can be achieved, enabling the sink to acquire the authentic data from sensors. Unfortunately, the naïve collaboration among sensors used in CoMAC and ExCo incurs many more security problems. For example, the adversary may arbitrarily inject a large number of bogus messages into the network, pretending that it is proceeding with the collaborative authentication procedure albeit the false data will not be accepted by the sink eventually. Such kind of Path-Based DoS (PDoS) attack [5] would deplete the energy of sensors forwarding bogus messages, significantly reducing the network lifetime. Additionally, the compromised sensor can either provide bogus authentication tags to the sink or contaminate the transmitted authentication tags during the collaboration of sensors, forcing the genuine sensed data to be rejected by the sink due to the inconsistency with the authentication tags. The above misbehavior is similar to False-Endorsement DoS (FEDoS) attack [6], and is in common with most of the collaborative security schemes. We also note that, although the resilience of ExCo against sensor compromises is claimed in [4] to be stronger than that of CoMAC that works as a baseline scheme, when the adversary is assumed to be capable of controlling all the compromise sensors and be aware of the target round right after the end of the target round, their resilience will the same. We will describe the above problem in more details in Section 4.1

Preliminaries
The network and security assumptions used in this paper, as described in the following, are similar to those of the prior works. Nevertheless, the adversary considered in this paper is more realistic and stronger than the ones in the literature.

Network Assumption
In this paper, a UWSN is composed of n homogeneous sensors, {s 1 , . . . , s n }, and a sink with mobility. Here, only the sink is assumed to have mobility. The unattended sensors are assumed to be static. These sensors are uniformly scattered over the sensing region to execute the pre-determined functionalities such as data gathering. The underlying network is assumed to be connected such that two arbitrary sensors can communicate with each other through either direct communication or multi-hop path. Time is divided into collection intervals, each of which will be further divided into v rounds, and can be synchronized by existing techniques [13][14][15]. In UWSNs, no constant connection between sensors and the sink exists. Instead, the itinerant sink periodically visits the UWSN to collect data at the end of each collection interval. Sensors are scheduled such that, in each round, each sensor obtains exactly one sensed datum. For ease of presentation, we assume that each sensor in each round gets exactly one sensed datum. Note that our proposed method actually can be applied on the case that each sensor can derive multiple sensed data or nothing in a round. After the contact to the data sink, sensors are securely re-initialized. We temporarily assume that the message transmission is reliable, which means that the message can be always be sent to the destination sensor. We will relax this assumption in later sections. In UWSNs, the geographic position of each sensor is known by the sink. This is due to the characteristic of UWSNs that the mobile sink harvests the sensed data by directly visiting the sensors. To do so, the sink must know the position of each sensor in advance. Two possible ways make the sink know the position of each sensor. First, sensors are deterministically deployed one by one by the network owner so that the position information will be passed to the sink by the owner. Second, sensors are randomly deployed, but can be aware of their geographic positions by using techniques such as [16][17][18][19] or specific hardwares such as Global Positioning System (GPS).

Security Assumption
In this paper, the adversary, whose goal is to let the sink accept the fraudulent data that is claimed to be the one sensed by s i at the target roundr, is considered. The adversary can simultaneously compromise k sensors at each round. The conceptual illustration is shown in Figure 2. It should be particularly noted that the adversary is assumed to migrate and control different sets of compromised sensors in the literature. We have a stronger adversarial assumption that, once being compromised, the sensor is always under the control of the adversary. The secret information stored in the compromised sensors will therefore be exposed to the adversary. The adversary can launch sensor compromise attack right after the sensor deployment in order to maximize its the possibility of subverting the network functionality. Nevertheless, since each sensor will be reset at the end of each collection interval, we focus on the security issue in a specific collection interval. We also note that the adversary is a global eavesdropper, which means that it can eavesdrop on each message transmitted over the network. In addition, the adversary learns its objective, s i andr, at the end of roundr. Note that s i andr are unknown to the sink so that the sink cannot apply protection only on specific sensors. Figure 2. The adversary also roams around the sensing region and compromises the sensors. In this plot, the adversary can compromise one sensor at each round, and three sensors are compromised by the adversary after the first three rounds of a collection interval.
Since the adversary focuses on substituting specific data, with the consideration of the unattended nature of UWSNs, forward-secure property, which means that the adversary gains no advantage about the data sensed before the round r from the sensor compromised at the round r, can be useful in resisting the adversary. Key evolution, in which the key used in round r is evolved from the key used in round r−1 through the cryptographic hash function, is a simple means to provide forward-security. Nevertheless, forward-secure cryptographic techniques are effective against only reactive adversary, which means that the adversary starts compromising sensors after it identifies the target, but is of no use when the proactive adversary, which means that the adversary can compromise sensors even before identifying the target, is considered. As a consequence, the backward-security, which means that the adversary learns no secret of the next round, is required. An authentication scheme with both forward-security and backward-security is called key-insulated, which means that the adversary can learn only the current secret.

Proposed Method
For completeness, before describing our proposed authentication scheme, we first briefly review the ExCo scheme proposed in [4] and describe its three security defeats in more details in Section 4.1 After that, the basic idea of our proposed Acquire Authentic Data (AAD) method is presented in Section 4.2 Finally, the detailed description of AAD will be presented in Section 4.3

Motivation
In ExCo, at each round r, each sensor j constructs the MAC z r j of its sensed value d r j and sends z r j to a random subset R r j of sensors. Within the round r, each sensor j has certain possibility of receiving MACs from the other sensors. If this happens, it temporarily keeps them in the local memory. At the end of round r, each sensor j constructs and stores the MAC, H r j = M AC(z r j ||z r q 1 || · · · ||z r q ϕ ), where || denotes the bit-string concatenation, if it receives MACs from the sensors q 1 , . . . , q ϕ . Afterwards, except for H r j , all the MACs generated and received in the round r are deleted. From the sink point of view, to authenticate the d r j , it involves the verification of MAC, H r j , and the MACs in R r j , where R r j is the set of sensors that send the authentication tags to s j . Note that the sensors in R r j are called the co-authenticators of s j at round r. For example, if R r j contains q 1 , . . . , q ϕ , then, in addition to the verification of H r j , the MACs, H r q 1 , . . . , H r q ϕ are required to be verified as well. First, although it is claimed in [4] that ExCo is stronger than CoMAC in terms of sensor compromises, their resilience against sensor compromises is actually the same in practice. Particularly, according to [4], it is claimed that the adversary needs to compromise all the sensors in j is the set of sensors that receive the authentication tags from s j and T r j is the set of sensors that send the authentication tags to the sensors in S r j , before the round r, to successfully subvert the security ExCo provides. Nevertheless, ifr is the target round, the adversary is aware ofr at the roundr + 1, and the adversary only compromises the sensors in {s j } ∪ R r j , then the adversary can still provide to the sink a counterfeit sensed data without being detected. It can proceed as follows. At first, we assume that, for each round r, each compromised sensor keeps the data received in rounds r − 1 and r in its local memory. As soon as the compromised sensor s j and compromised sensors in Rr j are aware ofr, s j instantly replaces the original sensor reading dr j with the valuedr j the adversary wants to report to the sink. In addition, the compromised sensors in Rr j pretend that they also receivedr j . Hence, before the end of roundr + 1, s j can collaborate with the sensors in Rr j to generate their own MACs that are consistent with each other and are consistent withdr j . With these generated MACs, thedr j to be reported to the sink will not be recognized to be counterfeit. Moreover, this is not an attack in an ideal case. Instead, this is a practical attack, because, in practice, the length of each round will not be too short and the adversary can be aware of the target round right after something desired happens. Hence, ExCo is not as strong as [4] claims and has the same security strength as CoMAC.
Second, Exco is vulnerable to PDoS attacks. In particular, after the adversary compromises a few sensors, these compromised sensors can be used to intentionally inject a large amount of useless traffic claimed to be the MACs in ExCo to waste the precious energy of sensors so that the network lifetime will be significantly reduced. Although a simple defense that limits the number of message forwarding at each sensor seems to be helpful in alleviating PDoS attacks, it is indeed useless because the network traffic on each sensor cannot be estimated in advance. As a consequence, if some limits are applied on the number of message forwarding at each sensor, some packets such as transmitted MACs in ExCo will be dropped somewhere on its way to the destination Under the circumstance, the verification will fail and some sensor readings will be regarded as falsified ones because some of the co-authenticators cannot provide the MACs.
Third, ExCo is also vulnerable to FEDoS attacks. In fact, this is a common attack if only a simply designed collaboration scheme is used. Specifically, after a sensor is compromised and is inquired by the sink to ask for the MAC used for authenticating the sensed data of the other sensors, it can always reply a random string to the sink. This implies that some data sensed by genuine sensors will be thought of as fake. This is because the MACs involved in a large number of sensors are necessary for authentication, leading to be easily vulnerable to FEDoS attacks. Simply taking the majority of the verification results of MACs from different co-authenticators seems to be helpful in alleviating FEDoS attacks, but it also reduces the resilience against sensor compromises. As a matter of fact, without the proper design, there always exists a dilemma of enhancing the resilience against sensor compromises or enhancing the resilience against FEDoS attacks. In addition, it has been demonstrated [4] that the expected number ν(n, t) of sensors involved in the authentication of the sensed data of one sensor in ExCo is where |R r j | = t is the number of co-authenticators. Figure 3 shows different settings of ν(n, t). Here, we further define a metric R r , called FEDoS ratio of r, which denotes the ratio of the number of sensors required to be compromised to the total number of sensors in the network. Note that the parameter r means that r × 100% of the verifications of sensor readings are affected. FEDoS ratio is used to evaluate and quantify the resilience against FEDoS attacks. According to the ν(n, t)'s derived as above, R 0.5 's in different settings are shown in Figure 4. It can be observed that when the number of co-authenticators used in ExCo is 2 and approximately 10% of sensors are compromised, over 50% of sensor readings will be regarded as bogus by the sink because of the bogus authentication tags provided by the compromised sensors. When t = 4, less than 2% of sensor compromises would result in above 50% of contaminated sensor readings, which will be a disaster in terms of the network security.

Basic Idea
The previous methods are vulnerable to PDoS attack because each sensor can arbitrarily select distant sensors, and sends the authentication tag to the selected sensors. To deal with this problem, our approach is straightforward but effective, i.e., we restrict that the sensor to which the authentication tag is sent cannot be too far away from the sensor sending the authentication tag. Clearly, the effect of PDoS attack can be mitigated because the energy waste due to the message relaying on the intermediate sensors can be reduced. Afterwards, the problem turns to be the design of a proper mechanism that can support the query of proximity information. Here, with the observation that the mobile sink in UWSNs is able to harvest the sensed data because it knows the position of each sensor, different Bloom filters containing different proximity information are stored in different sensors for the query purpose.
On the other hand, the reason that the prior methods suffer from the FEDoS attacks is due to the fact that the compromised sensor can always provide an MAC claimed to be legitimate to the sink. All the sink can do is to verify the data authenticity according to the received MACs. Nevertheless, the received authentication tag could be a random string generated by the compromised sensor, and therefore, the sink declines to accepting the authenticated data. In this respect, our idea is that the MAC used in the authentication procedure should be made in a specific fashion so that the legitimacy of the authentication tag can be verified by the sink before the sink verifies the data authenticity. It should be noted that our defense against FEDoS attacks can also enhance the resilience against sensor compromises. This reveals a particular merit of our method over ExCo.

Proposed Method: AAD
Our proposed AAD scheme consists of five phases: pre-deployment phase, post-deployment phase, sensing phase, receiving/forwarding phase, and verification phase. The pre-deployment phase is executed by the network owner to store necessary materials into the sensors. As its name shows, the post-deployment phase is performed by the sink right after the sensor deployment in order to store the materials required for the authentic data acquisition in the sensors. Sensing phase is executed by each sensor so as to sense data, and distribute the sensed data and the necessary MACs. Once receiving the packet, each sensor executes forwarding phase to forward the received message to the destination sensor. Verification phase is performed by the sink to collect the sensed data and verify their authenticity. The notations used in the following discussion are listed in Table 1. The details of these five phases are described as follows. The key used by s j at round r The hash value of d r j with key k r j S r j A random subset of sensors selected by s j at round r t r j The number of co-authenticators selected by s j at round r Pre-Deployment Phase. For each sensor s j , the sink randomly selects a unique key K j and a prime number p j . Note that the prime numbers to be stored in different sensors should be chosen to be distinct. Afterwards, K j and p j are stored in the sensor s j . The sensors are then deployed over the sensing region.
Post-Deployment Phase. Let λ be a user-selected parameter, which leverages the security and the energy consumption. The post-deployment phase is used to equip each sensor with a proper Bloom filter for the query of its λ-hop proximity relationship. Here, we say that the sensor s j has λ-hop proximity relationship with the sensor s i (or, s j is the λ-hop neighbor of s i ) if the hop distance between s j and s i is within λ. The post-deployment phase is automatically accomplished when sensors are deterministically deployed by the sink. This is because when deploying the sensor s j , the sink can also store the Bloom filter B j containing the λ-hop proximity information of s j in s j , given that the position of each sensor is pre-determined by the network owner.
On the other hand, the post-deployment phase will be accomplished within a period of time right after the sensor deployment if sensors are randomly deployed. More specifically, after the sensor deployment, each sensor acquires its geographic position using well-known positioning techniques [16][17][18][19], and then reports the acquired position to the sink. Such a reporting should be accompanied by the message authentication code (MAC) with the unique key K j so that the authenticity of the reported position can be guaranteed. Note that, the falsified position information could be injected by the adversary. Nevertheless, the falsified position information will be found by the sink because the sink will move to the position, trying to collect the sensed data. Thus, if falsified positions are injected by the adversary, they will be easily detected. After receiving each sensor's position information, the sink first checks its authenticity by examining whether it has the correct MAC. In particular, when the received message is ⟨ℓ j , ⟩, if is equal to h K j (ℓ j ), then ℓ j is regarded as the position of s j . With the geographic position of each sensor, the sink can construct the network graph, wherein the vertices denote the sensors and the edge between two vertices exists if the corresponding sensors can communicate with each other directly, according to the communication range predefined on each sensor. With the constructed network graph, the sink can be aware of the λ-hop proximity relationship of each sensor. Now, before the sensors start to sense data, the mobile sink starts its first itinerary over the network. For the contact of each sensor s j , the sink stores a Bloom filter, B j , containing the λ-hop proximity information of s j , in s j . B j can be constructed by the sink according to the topology of the network graph. Specifically, for each s i that has λ-hop proximity relationship with s j , the sink embeds s j ||s i into B j . Note that although the number of λ-hop neighbors of s j could be different for each s j , the size of Bloom filter used is chosen to be the same.
Sensing Phase. The description of sensing phase is shown in Figure 5. The sensing phase is executed by each sensor at each round. The sensor s j constructs the corresponding MAC, z r j = h k r j (d r j ), after it has the sensed data d r j at round r. If the sensed data and corresponding MAC are simply stored in its local memory, once the sensor is compromised by the adversary, all the security materials will be exposed to the adversary, and d r j and z r j can be arbitrarily generated so that the security breach will occur. Thus, in the sensing phase, the sensed data and its corresponding MACs of a sensor will be distributed over some sensors randomly chosen from its λ-hop neighbors. Specifically, a subset S r j of t sensors is first randomly sampled from {s 1 , . . . , s n } (line 3 in Figure 5). SELECT − DIST IN CT (t, n, j, k r j ) is a function randomly generating t sensor IDs from {s 1 , . . . , s n } \ {s j }. It can be implemented by using the hash function whose output is of length ⌈log n⌉ bits. Then, a subsetS r j of S r j is constructed by choosing the sensors in S r j whose hop distance to s j is within the pre-determined threshold λ. Here, each sensor s j uses the Bloom filter B j constructed in post-deployment phase to check the hop distances to the randomly selected sensors inS r j . To constructS r j , what each sensor s j needs to do is to check whether the s j itself has λ-hop proximity relationship with the generated sensors in S r j . For example, for s i ∈ S r j , the sensor issues s j ||s i to query B j . We can know that s i ∈S r j if the query result is positive and s i / ∈S r j otherwise (line 4 in Figure 5). Afterwards, the MAC, z r j , is transmitted to the sensors in S r j . Note thatS r j =t r j may vary as j and r vary. In general,t r j is not a constant. Nonetheless, for ease of the explanation,t r j is described as if it is a constant. In the following, the term "authentication tag of s j " is used to denote z r j and the term "co-authenticator of s j at round r" is used to denote the sensors inS r j . Then, s j , according to d r j , derives E k r j (d r j ) and z r j to guarantee the data privacy and authenticity, respectively. Lastly, s j sends Ω r j to the sensors inS r j whose second and third fields are always fixed to be 1 (lines 5 and 6 in Figure 5). Receiving/Forwarding Phase. The algorithm of this phase is described in Figure 6. This phase is executed by each sensor when a message is received. According to the role it acts, each sensor executes different tasks (lines 1∼7 in Figure 6); the receiving task will be executed if the received message is destined to itself, and the forwarding task (lines 8∼13 in Figure 6) is performed otherwise. Assume that s j is the forwarding sensor and receives Ω = ⟨s i , P, C, E, Z, s i ′ ⟩ (line 9 in Figure 6). At first, s j checks if it has λ-hop proximity relationship to the destination sensor s i ′ (lines 9∼13 in Figure 6). s j proceeds the procedures if s j has λ-hop proximity relationship to the destination sensor s i ′ and drops the received Ω otherwise. s j then adds the existence evidence of itself on the forwarding path on Ω so that the sink can check whether the received authentication tag passes through exactly those sensors it should pass. To achieve this goal, s j applies the keyed hash function with its secret key k r j on Z to have h k r j (Z). Moreover, to enable the sink to accomplish the verification, the information about the forwarding path should be included in Ω; it is the usage of P and C in Ω. C means that s j is on the (C + 1)-th sensor of the forwarding path and P can be used to extract all the sensor IDs on the forwarding path (The details will be stated in the description of the verification phase later). More specifically, after receiving Ω, s j increases C by one and multiplies P with p C+1 j , where p j is the prime number stored in s j in pre-deployment phase. Afterwards, the packet ⟨s i , P · p C+1 j , C + 1, E, h k r j (Z), s i ′ ⟩ will be forwarded to the next sensor on the forwarding path. Note that the selection of the next sensor depends on the routing protocol the underlying network uses and is not the focus of this paper. Nevertheless, due to the fact that each sensor knows its position, geographic routing [20] is a reasonable choice.
On the other hand, assume that s j is the destination sensor and receives Ω = ⟨s i , P, C, E, Z, s j ⟩ (line 4 in Figure 6). Under this situation, s j simply extracts ⟨s i , P, E, Z⟩ from Ω and then stores it into an ordered set R r j (line 5 in Figure 6). At the end of the round r, the key k r j will be evolved to the key k r+1 j of the next round r + 1 using the publicly-known hash function h (line 7 in Figure 6). Verification Phase. The algorithm describing the verification phase is shown in Figure 7. Assume that the sink would like to obtain the data dr j sensed by sensor s j at the roundr. What the sink should do is to perform the verification phase. Since the initial key of s j is given by the sink, the key kr j will be known by the sink. Thus, the co-authenticators of s j atr will also be known by the sink. This prevents the traces from being deleted by the adversary. The strategy of the sink is to move to the positions near those co-authenticators to collect dr j , acquiring the proper authentication tags. More specifically, assume that the co-authenticators of s j are s q 1 , . . . , s qtr j , s q 1 , . . . , s qtr j ∈ {1, . . . , n} \ {j}. For each s q i , i ∈ [1,tr j ], the sink acquires Rr i and extracts ⟨s j , P, E, Z⟩ (lines 3∼4 in Figure 7). Here, ⟨s j , P, E, Z⟩'s extracted from different co-authenticators will not be the same in essence. We, however, omit the necessary subscript and superscript of P, E, and Z without the ambiguity for convenience. From the acquired ⟨s j , P, E, Z⟩, with the proper key, the sink can decrypt to obtain thedr j (line 6 in Figure 7), which is claimed by s q i to be the data sensed by s j at roundr. Based on the decrypteddr j , the sink would try to verify if the authentication tag can be regenerated to match the authentication tag extracted from Rr i . Here, to reproduce the authentication tag, the sink needs to obtain the sensor IDs on the forwarding path of Ω in the correct order. Then, since the sink knows the keys of all the sensors on the forwarding path, if we can know that s j → s m 1 → s m 2 → · · · → s m ϕ is the correct order of sensors on the forwarding path connecting s j and s q i , the sink can construct h kr j (h kr m 1 (· · · h kr m ϕ (hr ks q i (dr j )) · · · )), where s m 1 , . . . , s m ϕ are the sensors on the route from s j to s q i , and see if it is equal to the authentication tag extracted from ⟨s j , P, E, Z⟩. To know s m 1 , . . . , s m ϕ (line 5 in Figure 7), the sink simply performs prime number factorization of P , obtainingp 1 x 1 ,p 2 x 2 , . . . ,p ϕ x ϕ , where ϕ is the length of the path connecting the sensor s j and its co-authenticator being examined by the sink. Here, we should note that ϕ varies when different co-authenticators are considered. Nonetheless, for ease of explanation, we also omit the necessary subscript and superscript. Then, the sink knows that s m i = s x i for all 1 ≤ i ≤ ϕ. Note that after the factorization of P , if P cannot be represented as the form ofp 1 x 1 ,p 2 x 2 , . . . ,p ϕ x ϕ , then the authentication material provided by the co-authenticator being examined will be ignored. The authentication material ⟨s j , P, E, Z⟩ extracted from s q i is dropped by the sink if the regenerated authentication tag does not match the authentication tag in ⟨s j , P, E, Z⟩ (line 7 in Figure 7). The decrypted datadr j is stored if h kr j (h kr m 1 (· · · h kr m ϕ (hr ks q i (dr j )) · · · )) is equal to the authentication tag extracted from Rr i . After the sink accomplishes the above procedures, if no authentication tag can be successfully verified, then all the data claimed to be the data sensed by s j at roundr are dropped. Assume that the sink accomplishes the above procedures and at least one authentication tag of co-authenticators can be successfully verified. If and only ifdr j extracted from the authentication materials sent from co-authenticators are all the same,dr j are regarded as genuine (lines 9∼12 in Figure 7). Example. Assume that λ = 3 and a specific round r is considered. For the sensor s 1 , we have the assumption that d r 1 = 5, k r 1 = 7, E k r 1 (d r 1 ) = E 7 (5) = 10, and z r 1 = h 7 (5) = 8. In addition, we assume that after the execution of SELECT − DIST IN CT and the checking procedure on B j ,S r 1 = {s 6 , s 7 }. Suppose that the shortest path connecting s 1 and s 6 is s 1 → s 2 → s 3 → s 6 , and the shortest path connecting s 1 and s 7 is s 1 → s 4 → s 5 → s 7 . The network topology in this example is shown in Figure 8.
Then, since s 1 has to transmit E 7 (5) and z r 1 to s 6 and s 7 , s 1 will send Ω r 1 = ⟨s 1 , 1, 1, E k r 1 (5), z r 1 , s 6 ⟩ and Ω r 1 = ⟨s 1 , 1, 1, E k r 1 (5), z r 1 , s 7 ⟩ to s 2 and s 3 , respectively. Since the sensors on these two paths work similarly, we only discuss the path s 1 → s 2 → s 3 → s 6 . When s 2 receives ⟨s 1 , 1, 1, 10, 8, s 6 ⟩, it first check its λ-hop proximity relationship to s 1 . Here, since we assume that λ = 3, the check can be passed. Thus, assuming that p 2 = 3, s 2 forwards ⟨s 1 , 9, 2, 10, h k r 2 (8), s 6 ⟩ to s 3 . Similarly, if we assume that p 3 = 7, after receiving ⟨s 1 , 9, 2, 10, h k r 2 (8), s 6 ⟩, s 3 sends ⟨s 1 , 3087, 3, 10, h k r 3 (h k r 2 (8)), s 6 ⟩ to s 6 . When s 6 receives ⟨s 1 , 3087, 3, 10, h k r 3 (h k r 2 (8)), s 6 ⟩, it stores ⟨s 1 , 3087, 10, h k r 6 (h k r 3 (h k r 2 (8))), s 6 ⟩ in its local memory. Figure 8. The network topology in the example. Now consider that the sink wants to obtain d r 1 . In our method, the sink has to perform two verifications on s 6 and s 7 . Basically, because the verification performed on s 6 is the same as the one performed on s 7 , we only describe the one performed on s 6 . From s 6 , the sink can obtain ⟨s 1 , 3087, 10, h k r 6 (h k r 3 (h k r 2 (8)))⟩. After performing the prime number factorization of the value 3087, the sink can know that 3087 = 3 2 × 7 3 , which means that the second hop sensor and the third hop sensor on the path connecting s 1 and s 6 are s 2 and s 3 , respectively. The sink also extractsdr j = 5 from E k r 1 (5) = 10. Then, the sink checks if h k r 6 (h k r 3 (h k r 2 (8))) is equal to the value obtained by sequentially applying three keyed hash functions h k r 2 (·), h k r 3 (·), and h k r 6 (·) on z r 1 = 8.dr j will be temporarily stored if the above check is passed and dropped otherwise. Finally, when all the co-authenticators are visited, the sink checks the consistency ofdr j 's obtained from different co-authenticators. Note that for the notational simplicity, we do not put additional subscript ondr j to distinguish differentdr j 's obtained from different co-authenticators.dr j is deemed to be genuine if they are consistent and bogus otherwise.

Performance and Security Analysis
In the section, the performance and security of AAD will be evaluated. Recall that at each round r, the adversary is able to compromise the set C r of k sensors. Without the loss of generality, we assume in the subsequent discussion that C r ∩ C r ′ = ∅ if r ̸ = r ′ .

Security Analysis
In what follows, the resilience of AAD against sensor compromises, PDoS attack and FEDoS attack will be described, respectively.
Resilience Against Sensor Compromises. Letdr j be the counterfeit data the adversary constructs to substitute dr j . For the adversary, to successfully deceive the sink into accepting the bogusdr j , it needs to compromise all the sensors inSr j and ESr j , where ESr j denotes the set of all the sensors on the route from s j to the sensors inSr j , before the target roundr. In other words, Compromising the sensors inSr j is used to substitute the authentication tag corresponding todr j while compromising the sensors in ESr j aims to know the keys stored in the sensors in ESr j so that the legitimacy of the bogus authentication tag can also be counterfeited.
To evaluate the resilience of AAD against the sensor compromises, we have to estimate the number of sensors involved in the authentication procedure of one datum in AAD. Let |N σ | be the number of sensors that is exactly σ-hop away from a specific sensor in a randomly deployed network. Before starting to estimate the number of sensors involved in the authentication procedure of one datum in AAD, we first need to know how to calculate |N σ |, 1 ≤ σ ≤ λ. Obviously, |N σ | is dependent on the size of the sensing region, the number n of sensors, and the communication range R of each sensor. Although |N 1 | = πR 2 n A − 1 can be easily obtained, there is no explicit formula for the expression of N σ for σ ≥ 2. An approximation of |N σ | is calculated in [21]. Although the calculation of N 1 is straightforward and the calculation of N 2 is not difficult, calculating N 3 is quite difficult. Considering two sensors a and b with communication range being r. There could be the case that b lies in the communication disk centered at a with the radius 3r but a cannot reach b via the exactly 3-hop communication. Even if b which lies in the communication disk centered at a with the radius 3r can reach a, we have to make sure whether they can communicate with each other via 3-hop communication or they can communicate with each other via 2 or 1-hop communication. Simultaneously considering all these kinds of possibilities is an extremely difficult task. To the best of our knowledge, no exact formula has been proposed to deal with the calculation of N σ . As [21] indicates, N σ is well approximated by using their proposed approximation formula. In our AAD scheme, for the energy conservation, λ is usually not too large, which means that we do not need to consider N σ for large σ. Thus, although the approximation formula of N σ gradually deviates from the true N σ as σ increases, it is still sufficient for our use. Therefore, in the following discussion, we refer to [21] for the derivation of |N σ |, 2 ≤ σ ≤ λ. For 1 ≤ σ ≤ λ, the probability of randomly selecting a sensor whose hop distance to the sensor s j is exactly σ can be expressed as |Nσ| n . Therefore, the probability of randomly selecting a sensor whose hop distance to s j is within λ is ∑ λ σ=1 |Nσ| n , and the expected number of randomly chosen sensors whose hop distance to s j is within σ is ∑ λ σ=1 |Nσ| n · t when t sensors are randomly chosen. These ∑ λ σ=1 |Nσ| n · t sensors exactly constitute the set of sensors inSr j . Recall that one of the characteristics of Bloom filter is that the query result is always correct if the content to be queried is indeed stored in the Bloom filter. Thus, we havē On the other hand, all the sensors on the route to the sensors inSr j also contribute to the authentication procedure. As the number of these sensors is difficult to be precisely calculated, we will approximate it by assuming that the routes to the selected sensors inSr j are non-overlapping. For each sensor s i that is exactly σ-hop away from s j , σ − 1 sensors lie on the route between s j and s i . Thus, ∑ λ σ=1 |Nσ| n t(σ − 1) sensors are involved when ∑ λ σ=1 |Nσ| n t sensors are inSr j . It implies that, |Sr j ∪ ESr j | can be upper bounded by: Because of our assumption that the routes from s j to the sensors inSr j are non-overlapped, this value overestimates |Sr j ∪ ESr j | and, thus, can only work as the upper bound of |Sr j ∪ ESr j |. In practice, the approximation of |Sr j ∪ ESr j | in Equation (3) is quite accurate when the ratiot r j η , where η denotes the average number of one-hop neighbors for a sensor, is sufficiently small. Nonetheless, since we know that the fewer the sensors involved in the authentication procedure, the lower the resilience against sensor compromises, even Equation (3) provides a good approximation of |Sr j ∪ ESr j | in most cases and can be used to estimate the upper bound of communication overhead in Section 5.2, we still need a lower bound of |Sr j ∪ ESr j | to derive the lower bound of the resilience of AAD against sensor compromises. In essence, even if the setting oft r j ≥ 1 is used, the lower bound of |Sr j ∪ESr j | can be obtained according to the setting oft r j = 1. By the similar calculation as in Equation (3), we can know that |Sr j ∪ ESr Define P ADV as the probability that the event described in Equation (2) happens. Thus, for the adversary that always randomly compromises the sensors over the whole network, the upper bound of probability P ADV can be obtained as: With kr > n, it means that all the sensors are compromised and the whole network is in the complete control of the adversary. On the other hand, with kr ≤ n, there are ( n kr ) ways to compromise the sensors in the networks and at least ∑ λ σ=1 |Nσ| n σ specific sensors should be included in the set {C 1 , . . . , Cr} of compromised sensors. As a result, the upper bound of P ADV defined in Equation (5) holds. Here, it should be noted that instead of compromising random sensors over the network, the adversary that is aware of the application of AAD on the network may intentionally compromise the sensors in the nearby region of s j so as to increase its success probability P ADV . Nevertheless, this strategy is infeasible for the adversary since the target sensor s j and the target roundr are known only after the end of the target round. Thus, without the prior knowledge of the target sensor, in general, what the adversary can do is only to randomly compromise the sensors over the network. Define P AAD as the survival probability, the probability that the data dr j remain unforged can be calculated as: Because of the upper bound of P ADV in Equation (5), the lower bound of P AAD can be represented as: P AAD 's in different settings are shown in Figure 9. Note that the dotted line represents the lower bound of P AAD in the setting of n = 100, k = 10, and |N 1 | = 10, irrespective of the parameter t used in AAD. Although the dashed line and solid line act as the upper bound of P AAD in the setting of n = 100, k = 10, and |N 1 | = 10. They, however, are pretty accurate approximations of P AAD in a relatively dense network. We can observe that the curve of survival probabilities generated by the use of AAD with the parameterst r j = 3 and λ = 5 is close to that in [4] with 15 co-authenticators used in their proposed protocol. Therefore, we can also claim that, when the similar level of security is required, the communication overhead incurred by AAD is lower than that incurred by prior works.
Resilience Against PDoS attacks. In general, the adversary is always able to launch PDoS attacks. In other words, PDoS attacks can only be mitigated, but not eliminated. The evaluation method similar to the ones used in [22][23][24][25] is conducted here to demonstrate the superiority of AAD over CoMAC and ExCo in terms of the resilience against PDoS attacks. Recall that the resilience of CoMAC and ExCo are the same. Let L AAD and L ExCo be the bit-length of the authentication material of AAD and ExCo, respectively. Assume that a MAC is of length 128 bits and an encrypted data is also of length 128 bits. In AAD, the field representing the counter C is assumed to be of length 8 bits because the setting of λ ≤ 2 8 − 1 is sufficient in most cases. The field representing the aggregated prime number P is assumed to have 32 bits. Assume that in total n sensors are deployed. Consequently, L AAD can be calculated as 128 + 128 + 32 + 8 + 32 + 2 √ n = 328 + 2 √ n bits. On the other hand, since the authentication material only contains the authentication tag, the size of L ExCo is 128 + 2 √ n bits. For a flat network whose n sensors are uniformly distributed, the average hop distance between arbitrary two sensors is √ n [26]. In the following evaluation, it is assumed that the compromised sensor sends bogus message to a random sensor, instead of sending to the most distant sensors. Also note that such evaluation is advantageous to the calculation of ExCo's energy consumption but is disadvantageous to the calculation of ours. Nevertheless, such evaluation will imply the lower bound of the energy saving of our AAD over ExCo. We can know from [23] that when telosB motes [27] are used, the energy for receiving one bit requires 0.2707µJ and the energy for transmitting one bit requires 0.2505µJ. In ExCo, the energy E ExCo wasted by the forwarding of one single bogus message in the PDoS attack is therefore 0.5212 · ( √ n − 1) · L ExCo = 0.5212 · ( √ n − 1) · (128 + 2 √ n)µJ. In AAD, the energy E AAD wasted by the forwarding of one single bogus message in the PDoS attack is therefore 0.5212 · (λ − 1) · L AAD = 0.5212 · (λ − 1) · (328 + 2 √ n)µJ. Hence, the ratio of energy saving of our AAD method to ExCo can be expressed as 0.5212·(λ−1)·(328+2 √ n) 0.5212·( √ n−1)·(128+2 √ n) and is depicted in Figure 10. Since the ratio less than one means that the energy consumed by AAD is lower than that consumed by ExCo, we can observe in Figure 10 that in most cases the energy consumption of AAD is lower than that of ExCo. Note that the energy calculation is based on the energy consumed by the message transmission from the source to the destination. Since the adversary can always launch PDoS attacks, the calculation is independent of the parameter t. The energy saving of AAD is due to the use of λ in restricting the number of hops used to relay the messages. Resilience Against FEDoS attacks. Recall that FEDoS attacks aim to deceive the sink into rejecting the genuine sensor readings by providing false authentication tags. FEDoS attacks easily work because what the sink can trust is only the authentication tag the co-authenticators provide. Therefore, if one of co-authenticators is compromised, the false authentication tag generated by it makes FEDoS attack successful. Nevertheless, in AAD, for a compromised sensor that happens to be a co-authenticator of the data of the sink's interest, it cannot simply generate and then provide a random bit string acting as the authentication tag to the sink. Instead, together with the claimed authentication tag, some additional information should be accompanied so that before accepting the received bit string as the genuine authentication tag, the sink can check the legitimacy of the claimed authentication tag. Specifically, P and E serve for this purpose. From the algorithm of the verification phase summarized in Figure 7, it can be observed that for the route to a specific co-authenticator, unless the keys of all the sensors on the route are known by the adversary and the corresponding co-authenticator is compromised prior to the roundr, the authentication tag cannot be counterfeited. We have an interesting observation that, in the case wheretr j = 1, successfully launching FEDoS attacks is as difficult as successfully replacing the data on the target sensor at the target round because they all need to compromise the sensors inSr j and the sensors on the route from the target sensor to the sensor inSr j . Here, we also employ FEDoS ratio to evaluate the resilience of AAD against FEDoS attacks. When the average number |N 1 | of the one-hop number of the sensor in a network is 5, R 0.5 's of AAD in different settings are shown in Figure 11. When the average number |N 1 | of the one-hop number of the sensor in a network is 10, R 0.5 's of AAD in different settings are shown in Figure 12. It can be known from Figures 11 and 12 that the resilience of AAD against FEDoS attacks is independent of the network topology such as the sensor density and the number of sensors. In addition, it can be observed that, ast r j increases, R 0.5 will be decreased. The reason for this phenomenon is that, the increase oft r j means the increase of the number of sensors involved in the authentication procedure, resulting in the possibility of the adversary compromising the involving sensors and launching FEDoS attacks.

Performance Analysis
Storage Overhead. Each sensor needs to store a Bloom filter, key, and prime number. They imply the constant storage overhead. At each round r, each sensor s j sendst r j authentication materials, ⟨j, P, E, Z⟩, to the randomly selected sensors. As a collection interval consists of v rounds, we therefore know that the storage overhead of each sensor in each collection interval is O(v ·t r j ). Communication and Computation Overhead. As described in Section 5.1, the upper bound of the number of sensors involved in the authentication procedure for a single datum is ∑ λ σ=1 |Nσ| n tσ. Since each involved sensor performs either sending or receiving operation, the number of involved sensors can be used to estimate the communication and computation overhead of AAD. As the size of the authentication material is constant in terms of the network size, n, the communication overhead of each sensor in AAD is O( ∑ λ σ=1 |Nσ| n tσ). In addition, as the number of the operations needed to be performed is also a constant in terms of the network size, the communication overhead of each sensor in AAD is O( ∑ λ σ=1 |Nσ| n tσ) as well.

Conclusions
In this paper, a scheme, called AAD, is proposed to Acquire Authentic Data in UWSNs. AAD has superior resilience against sensor compromises over the prior works. Compared with the existing methods that are vulnerable to PDoS and FEDoS attacks, AAD can also be resilient against PDoS and FEDoS attacks. Interestingly, the communication overhead, which dominates the energy consumption of sensor networks, can be even lower than that of prior works. The efficiency and effectiveness of AAD have been demonstrated via the analysis and simulation.
In addition to the PDoS and FEDoS attacks considered in the paper, there are actually many attacks that need to be considered too. Thus, one of future works is to equip our proposed AAD scheme with the ability to defend against radio jamming attacks. In addition, we also consider the possibility of mobile sensors in future applications. For example, the sensors may cruise a given area to collect the data of interest. For now, our method cannot apply to such scenario because our AAD scheme heavily relies on the invariant position information of each sensor. Hence, another future work is to develop a variant of AAD so that it can work on the mobile sensor networks.