A Secure-Enhanced Data Aggregation Based on ECC in Wireless Sensor Networks

Data aggregation is an important technique for reducing the energy consumption of sensor nodes in wireless sensor networks (WSNs). However, compromised aggregators may forge false values as the aggregated results of their child nodes in order to conduct stealthy attacks or steal other nodes' privacy. This paper proposes a Secure-Enhanced Data Aggregation based on Elliptic Curve Cryptography (SEDA-ECC). The design of SEDA-ECC is based on the principles of privacy homomorphic encryption (PH) and divide-and-conquer. An aggregation tree disjoint method is first adopted to divide the tree into three subtrees of similar sizes, and a PH-based aggregation is performed in each subtree to generate an aggregated subtree result. Then the forged result can be identified by the base station (BS) by comparing the aggregated count value. Finally, the aggregated result can be calculated by the BS according to the remaining results that have not been forged. Extensive analysis and simulations show that SEDA-ECC can achieve the highest security level on the aggregated result with appropriate energy consumption compared with other asymmetric schemes.


Introduction
Wireless sensor networks (WSNs) consist of thousands of sensors that collect data from a certain deployed range. Currently, WSNs have plenty of applications, such as military investigation, environment monitoring and accident reporting, etc. Typically, sensors have strictly limited computation and communication abilities and power resources; therefore, reducing the power consumption is a critical concern for WSNs. For better energy utilization, data aggregation [1,2] has been proposed recently. The original concept is to aggregate multiple sensing messages by performing statistical or algebraic operations, such as addition, minimum, maximum, median, etc. Since only the aggregated results need to reach the base station (BS) instead of sensing data, communication costs can be significantly reduced. Unfortunately, data aggregation is vulnerable to some attacks. For example, an adversary could compromise cluster heads (aggregators) similar to compromising all its cluster members. To solve this problem, several schemes, such as SDAP [3], PEPDA [4], Jung et al.'s scheme [5] have been proposed. However, these schemes can only guarantee the data privacy during the process of data aggregation and have a long aggregation delay.
An alternative method for secure data aggregation is to use privacy homomorphic encryption (PH), which can aggregate encrypted messages directly from sensors without decrypting so that it has a short aggregation delay. An adversary knows nothing from forging aggregated results even if the aggregators are compromised, because aggregators are unable to encrypt messages. PH is allowed to carry out specific types of computations on ciphertext, and the decrypted aggregation result matches the result of operations performed on the plaintext. PH has been used for data aggregation in WSNs, such as in Wang et al.'s scheme [6], CDAMA [7], Tiny PEDS [8], etc. However, the existing PH schemes suffer from the data integrity issue.
In this paper, we focus on bridging the gap between data privacy and integrity in WSNs. Some symmetric secure aggregation schemes [9,10] have been proposed to achieve both data privacy and integrity, but they cannot defend against node compromise attacks due to its inherent drawback that the encryption key is same as the decryption key. In general, symmetric schemes are less secure than asymmetric ones, although they are more efficient in terms of computational cost. Therefore, we originally propose a secure-enhanced data aggregation scheme based on Elliptic Curve Cryptography (ECC), called SEDA-ECC, which is an improved version of Boneh et al.'s asymmetric scheme [11]. To the best of our knowledge, SEDA-ECC can defend against the most attacks with appropriate energy consumption compared with other asymmetric schemes.
The rest of the paper is organized as follows: in Section 2, the existing secure data aggregation schemes in WSNs are presented. The system model and preliminaries are discussed in Section 3. In Section 4, a secure-enhanced data aggregation scheme based on ECC is proposed. Section 5 describes the security analysis of SEDA-ECC, and Section 6 presents performance evaluation and comparison to prove the effectiveness and efficiency of our scheme. Finally, we conclude SEDA-ECC in Section 7.

Related Works
Currently, many secure data aggregation schemes have been proposed. For symmetric schemes, Ozdemir et al. [9] integrated false data detection with data aggregation and confidentiality, and proposed an authentication protocol. In the scheme, every aggregator has some monitoring nodes which also perform data aggregation for data verification, and the integrity of the encrypted data is verified by the sensors between two consecutive aggregators. Its limitation is the rigorous topological constraints. Papadopoulos et al. [10] presented an exact aggregation scheme with integrity and confidentiality, named SIES. SIES combines the symmetric homomorphic encryption with secret sharing. A wide range of aggregates can be covered, and a small amount of bandwidth consumption is introduced in SIES. However, the data transmission efficiency is low due to the oversize space of secret keys. Based on Aggregation-Commit-Verify approach, Chan et al. [12] first proposed a provably secure hierarchical data aggregation scheme, where the adversary is forced to commit to its choice of aggregation results, then the sensors are allowed to verify whether their aggregation contributions are correct or not. The scheme can be used for multiple malicious nodes and arbitrary topologies, but it inherits the weakness of large amount of communication and computation overheads. To address this issue, Frikken et al. [13] improve Chan's scheme by reducing the maximum communication per node from O(Δlog 2 n) to O(Δlogn), where n is the number of nodes in WSNs, and Δ is the maximum degree of the aggregation tree.
For asymmetric schemes, Zhu et al. [14] focused on preserving data integrity and proposed an efficient integrity-preserving data aggregation protocol named EIPDAP. The scheme is based on the modulo addition operation using ECC, and has the most optimal upper bound on solving the integrity-preserving problem for data aggregation. Niu et al. [15] proposed a secure identity-based lossy data aggregation scheme using homomorphic hashing and identity-based aggregate signature. In the scheme, the authenticity of aggregated data can be verified by both aggregators and BS. The computation and communication overheads could be significantly reduced because the BS can perform batch verification. However, the above two schemes may lead to the leakage of data privacy due to decryption at the aggregator. Based on PH, Westhoff et al. [16] and Girao et al. [17] proposed CDA methods to facilitate aggregation in encrypted data, where richer algebraic operations can be directly executed on encrypted data by aggregators. Mykletun et al. [18] adopted several public-key-based PH encryptions to achieve data concealment in WSNs. Furthermore, Girao et al. [8] proposed a novel scheme by extending the ELGamal PH encryption. However, the above schemes cannot resist node compromise attacks. Specific security analysis is presented in Section 5.

System Model and Preliminaries
In this section, we describe the aggregation model and the attack model. The aggregation model defines how aggregation works, and the attack model defines what kinds of attacks our secure data aggregation scheme should protect against.

Aggregation Model
We consider large scale WSNs with densely deployed sensors. In WSNs, there are three types of nodes: base station (BS), aggregator, and leaf node. In this paper, we consider the aggregation tree roots at the BS like general data aggregation protocol [1,3]. Sensor nodes have overlapping sensing regions due to the dense deployment, and the same event is often detected by multiple sensors. Hence, data aggregation is proposed to reduce data transmission. The non-leaf nodes, except the BS, may also serve as aggregators. They are responsible for combining answers from their child nodes and forwarding intermediate aggregation results to their parents. Without loss of generality, we focus on additive aggregation, which can serve as the base of other statistical operations (e.g., count, mean, or variance).

Attack Model
First, we categorize the abilities of the adversary as follows: (1) An adversary can eavesdrop on transmission data in a WSN.
(2) An adversary can send the forged data to leaf nodes, aggregators, or BS.
(3) An adversary can compromise secrets in sensors or aggregators.
Then, we define five attacks to qualify the security strength of the secure data aggregation schemes, based on adversary's abilities and purposes.
(1) Ciphertext analysis Ciphertext analysis is a very common and basic attack. In such an attack, an adversary wants to deduce the secret key or obtain information only by interpreting ciphertext. A secure scheme must ensure that it is not possible to gain any information or key, and an adversary cannot decide whether an encrypted ciphertext corresponds to a specific plaintext or not.
(2) Chosen plaintext attacks Given some chosen samples of plaintexts and corresponding ciphertexts, the adversary can determine secret information or deduce the key. A secure scheme must ensure that an adversary cannot deduce secret keys or additional information out of the known set, even with a large set of plaintexts and their ciphertexts.

(3) Malleability
The aim of the adversary is to alter the valid ciphertexts without leaving marks. In this kind of attack, an attack can randomly generate meaningless ciphertexts that are syntactically correct to harm the system. For many PH schemes, it is possible to alter the ciphertexts without knowing the concrete content. Hence, a secure scheme should not let the adversary be able to successfully change the contents of encrypted packet.

(4) Unauthorized aggregation
In this kind of attack, an adversary is to aggregate two or more ciphertexts into forged but format-valid ciphertexts, then to inject them into the network for vandalizing the system. An adversary can compromise sensors or aggregators. When an adversary compromises an aggregator and gets its secret, it can easily launch unauthorized aggregation and malleability attacks. When an adversary compromises a sensor and gets its secret, it can decrypt the ciphertexts of all sensors in the symmetric schemes; besides, it also can impersonate the sensor or the other sensors to generate legal ciphertexts in both symmetric and asymmetric schemes.

Privacy Homomorphism
A privacy homomorphism is an encryption transformation which allows direct computation on the encrypted data. Let m 1 and m 2 be two plaintexts, and , × be the homomorphic operations on the ciphertexts and plaintexts respectively, we have Enc(m 1 ) Enc(m 2 ) = Enc(m 1 × m 2 ), where Enc(m) represents the ciphertext of m. Component-wise multiplications and additions of ciphertexts result in the corresponding multiplications and additions of plaintexts. If E (p,q) (m 1 ) = (x 1 ,y 1 ) and E (p,q) (m 2 ) = (x 2 ,y 2 ), then: However, symmetric cryptography-based privacy homomorphism has been proved to be insecure in chosen plaintext attacks for some specific parameters [19]. Therefore, privacy homomorphism based on asymmetric cryptography should be used instead of privacy homomorphism based on symmetric cryptography for some mission critical networks.

BGN Scheme
Boneh et al. [11] propose a PH scheme (abbreviated as BGN) based on the encryption schemes proposed by Paillier [20] and Okamoto-Uchiyama [21]. Both additive and multiplicative homomorphisms are provided in BGN, however, multiplicative homomorphism is inefficient and very expensive for WSNs because it is based on the bilinear pairing. Hence, we only adapt additive homomorphism of BGN to our scheme. The additive homomorphic encryption of BGN can be applied to private data aggregation, which is described in Algorithm 1.
Due to large computational overhead of the asymmetric cryptography, Boneh et al. construct BGN on a cyclic group of elliptic curve point. In phase 1 of BGN scheme, supposing E is the set of elliptic curve points that form a cyclic group, ord(E) denotes the number of points in E. Supposing is a point in E, ord( ) denotes the order of a point . If ord( ) = q, there is q* = ∞, where ∞ is the identity element of the group. In phase 2, point addition and scalar multiplication over points and are used to encrypt the message M. Ciphertext C is composed of the message part and the secure randomness. In phase 3, BGN can aggregate the ciphertext due to homomorphic property. As we can see, the aggregated result will be the form of ∑M * + ∑R * , where ∑M is the sum of the messages, and ∑R is the sum of the randomness. In phase 4, BGN can decrypt the aggregated result to get the plaintext by multiplying the result with private key. When randomness of point is removed by multiplying the order of , we can obtain ord( ) * ∑M * . Finally, the plaintext ∑M can be retrieved by applying the discrete logarithm. Algorithm 1. BGN scheme.

SEDA-ECC: A Secure-Enhanced Data Aggregation Based on ECC
In this section, we modify BGN to fit the SEDA-ECC scheme, so the security of BGN and SEDA-ECC are all based on the hardness assumption of subgroup decision problem. If we only provide the privacy protection of data aggregation, BGN can be used in SEDA-ECC directly, however, we also aim to ensure the data integrity, hence, different public-private key pairs and disjoint aggregation tree will be adopted. We first describe the details of SEDA-ECC scheme, which consists of six phases listed in Algorithm 2, then we present a case study of SEDA-ECC.

Key Generation Phase
Given a security parameter , the tuple (q 1 , q 2 , q 3 , E) is generated. E is the set of elliptic curve points that form a cyclic group, and ord(E) = n = q 1 q 2 q 3 , where q 1 , q 2 , q 3 are large primes, and the bit lengths of them are the same. Then, randomly select three points ( 1 , 2 , 3 ) from E, where the order of i is n, i = 1, 2, 3. Compute point = q 2 q 3 * 1 , = q 1 q 3 * 2 , and point = q 1 q 2 * 3 , such that the order of , and is q 1 , q 2 , and q 3 respectively. Phase 1. Key-Gen(λ): generate a public-private key pair. 01: Compute (q 1 , q 2 , E) using security parameter λ, where E is the set of elliptic curve points that form a cyclic group. ord(E)= n= q 1 q 2 , where q 1 , q 2 are large primes, and the bit lengths of them are the same, i.e., 02: Randomly select two generators, and such that ord( ) = ord( ) = n. 03: Compute point = q 2 * such that ord( ) = q 1 . 04: Select parameter T < q 1 as the maximum plaintext boundary. 05: Generate Public key PK = (n, E, , , T) and Private key SK = q 1 .   .

02: Compute the aggregated ciphertext
where E is the set of elliptic curve points that form a cyclic group. ord(E) = n = q i1 q i2 q i3 , where q i1 , q i2 , q i3 are large primes, and the bit lengths of them are the same, i.e., |q i1 | = |q i2 | = |q i3 |.

03: Compute point
Phase 2. Dis-Tree(p r , p g , p b ): disjoint aggregation tree construction with probability p r , p g and p b . 01: BS triggers the aggregation by a HELLO message, when receiving such a message, nodes select their roles: red aggregator, green aggregator and blue aggregator. Aggregators then also forward the HELLO messages. 02: If a node receives HELLO messages from red, green and blue aggregators, it randomly selects its role according to p i ; otherwise it waits until the HELLO messages from all kinds of aggregators are received. 03: Three disjoint aggregation trees rooted at the BS can be formed as the disjoint tree construction procedure continues. Red aggregators, green aggregators and blue aggregators interleave with each other.  01: Compute the aggregated ciphertext

03: Generate the ciphertext
where ∑M ij represents the aggregated result of tree T i , ζ i represents the number of aggregated ciphertexts in tree T i , and ∑R ij represents the aggregated randomness in tree T i . 02: Return C ia . 01: Set i, j,k {r, g, b}, and i ≠ j ≠ k. The scalar of is the aggregated messages, the scalar of is the count of ciphertexts, and the scale of is randomness for security. We can check the integrity of the aggregated results by its count, the detail of check method is described in phase 6. For each subtree, the Public key is PK = (n, E, , , ) and the Private key is SK = {(q 1 q 3 ), (q 2 q 3 )}.

Aggregation Tree Disjoint Phase
Three subtrees are built in this scheme, which are called red aggregation tree, green aggregation tree, and blue aggregation tree, respectively, and the BS is the root of the above three subtrees. Assuming the network is dense enough, each node, except the BS, takes one of the four roles: red aggregator, green aggregator, blue aggregator, or leaf node. We partition the tree into three subtrees, the disjoint tree is as shown in Figure 1, where the black colored nodes represent red aggregators, grey colored nodes represent green aggregators, and white colored nodes represent blue aggregators. Step 1. BS is appointed to be the root of the above three subtrees, which initiates a HELLO message requesting sensors to organize into one of the three aggregation trees. In that message, it contains its own ID and its level information L r = L g = L b = 0.
Step 2. Each sensor receiving the message should make the decision on its role, assign its own level to be L i + 1 (i = r, g, b), and select the sender node as its parent. A node becomes a red aggregator with probability p r , a green aggregator with probability p g , and a blue aggregator with probability p b , respectively. The probability will be subject to the conditions: 0 < p r = p g = p b < 1, and p r + p g + p b = 1.
Step 3. Each node in one aggregation tree rebroadcasts the colored message corresponding tree, which contains its own ID and level. If any node has already been in the tree when receives the message, it will reject the message; otherwise, the node also assigns its level L i to be L i + 1. Three aggregation trees are constructed till all nodes have a level and a parent. To balance the red, green and blue aggregators in a given neighborhood, a node should wait enough time to receive HELLO messages from red, green and blue aggregators as much as possible before the decision on its color is made. Then, p r , p g and p b can be computed by each node as follows: BS red node green node blue node where N i is the number of HELLO messages that one sensor receives from the i aggregators (i = r, g, b).
It should be noted that only a very few nodes do not participate in data aggregation when the network is dense enough.
Step 4. During the process of aggregation, red aggregators are not allowed to forward the data for green and blue aggregators, and vice versa. Then, the separation of data aggregation can be achieved along the disjoint trees. Finally, the BS will receive three aggregated results M r , M g and M b respectively.
Note that an adversary may compromise the data integrity during this phase by sending two HELLO messages with different colors. This can be prevented by guaranteeing that a node in one tree cannot be in another two trees. However, such attack can be detected easily by its neighbors because of the shared-medium nature of wireless links. Therefore, the adversary can be excluded from the three aggregation trees.

Encryption Phase
We set T M < q 1 . The message space of a sensor node M should subject to M i ∈ {0, 1, …, T M }, where i = r, g and b. Each sensor picks a random R i {0, 1, …, n − 1}, and encrypt the message M i using public key PK, then it generates the ciphertext C i = M i * + + R i * , where + is the addition of elliptic curve points and * is the scalar multiplication of elliptic curve.

Aggregation Phase
Let ∑M ij denote the aggregated message of tree T i , ζ i denote the number of aggregated ciphertexts of tree T i , and ∑R ij denote the aggregated randomness in tree T i , consequently, k ciphertexts for j = 1 to k are aggregated into a ciphertext of C ia as follows:

Decryption Phase
During the decryption phase, the BS can separately decrypt the aggregated result M i and its count ζ i from the aggregated ciphertext in tree T i respectively as follows:

Data Integrity Check Phase
When the BS receives the three aggregated results from the red, green and blue subtrees, it should decrypt them and extract the count ζ i , respectively. If a compromised aggregator tampers with the aggregated result M i , the count value ζ i must be changed simultaneously because the aggregators do not know the base point and . Therefore, the BS will compare ζ i with each other, and it indicates that the messages have not been tampered with en route only if they are almost the same. We set Th as difference threshold parameter, i, j, k {r, g, b}, and i ≠ j ≠ k.
If |ζ i − ζ j | ≤ Th, and |ζ j − ζ k | ≤ Th, it shows each result has not been tampered, then the BS accepts the three aggregated results and computes the final result M = M i +M j + M k ; if |ζ i − ζ j | ≤ Th, and |ζ j − ζ k | > Th, it shows M k has been tampered, then the BS rejects M k , and computes the approximate aggregated result = 3/2(M i + M j ); if |ζ i − ζ j | > Th, and |ζ j − ζ k | > Th, it shows the three aggregated results maybe have been tampered totally, then the BS either rejects all the aggregated results M i , M j and M k , or decides which aggregated result is real by gathering topology information.

A Case Study
We present a case study to show how SEDA-ECC works. For simplicity, we assume that the network only consists of six leaf nodes and three aggregators besides BS, and the three subtrees have the same public key PK = (n, E, , , ). As shown in Figure 2, each subtree has two sensor nodes and one aggregator. Three aggregators, DA r , DA g and DA b are deployed to gather messages from their child nodes respectively. For simplicity, the order of , and are set to small numbers. Supposing the order of and value of q 1 is 13, the order of and value of q 2 is 17, and the order of and value of q 3 is 19, then the order of n = q 1 q 2 q 3 is 4,199. Sensors in three subtrees encrypt and send their data as follows, where the scalars of are randomly generated by sensors. SN r1 generates message M r1 = 2, and encrypts message as C r1 = 2 + + 34 ; SN r2 generates message M r2 = 5, and encrypts message as C r2 = 5 + + 13 ; SN g1 generates message M g1 = 6, and encrypts message as C g1 = 6 + + 59 ; SN g2 generates message M g2 = 3, and encrypts message as C g2 = 3 + + 22 ; SN b1 generates message M b1 = 4, and encrypts message as C b1 = 4 + + 62 ; SN b2 generates message M b2 = 15, and encrypts message as C b2 = 5 + + 39 .
The encrypted messages are sent to data aggregators. Data aggregator DA r aggregates C r1 and C r2 as C r = 7 + 2 + 47 . Similarly, data aggregator DA g aggregates C g1 and C g2 as C g = 9 + 2 + 81 , data aggregator DA b aggregates C b1 and C b2 as C b = 9 + 2 + 101 . Because the order of is 19, 19 = ∞, where ∞ is the additive unit element in ECC. Therefore, we can get C r = 7 + 2 + 9 , C g = 9 + 2 + 5 , and C b = 9 + 2 + 6 .
(3) Finally, the aggregated result of red subtree M r = 7 can be obtained by the BS according to Pollard's λ method.
Similarly, the BS can also extract the aggregated count result ζ by computing the discrete logarithm of q 1 q 3 * C r to the base point = q 1 q 3 * . Therefore, BS can identify the forged result by comparing the aggregated count value. If the difference among three subtrees aggregated results is within the range of threshold Th, then BS validates the integrity of the aggregated result.

Theoretical Analysis
In this section, we analyze the coverage of aggregation trees first because it has great effect on our scheme's availability, then analyze the security of SEDA-ECC and compare it with five well-known secure data aggregation schemes: CDA [16,17], Castelluccia et al.'s scheme [22], BGN scheme [11], EC-OU scheme [18], and TinyPEDS scheme [8].

Coverage of Aggregation Trees
In SEDA-ECC scheme, a sensor reports its data to BS by aggregation only when it can reach red, green and blue aggregation trees within one hop. If a node cannot reach the three aggregation trees, it is disconnected from the BS for aggregation. We define Φ(G) as the probability that all the sensors are covered by all the three aggregation trees. It means that many sensors cannot contribute their data to the aggregation result if Φ(G) is small. Therefore, the coverage of aggregation trees impacts the accuracy of aggregation results. The aggregation accuracy is one of the most important performance metrics, because it can affect the decision of BS, so we should first analyze the coverage of aggregation trees to verify our scheme's availability.
Consider a random network G (n, l), where n is the number of sensors, and l is the transmission range of a sensor. We randomly assign red, green or blue to sensors in the networks, and let S denote the number of sensors which are isolated from red, green or blue sensors, then: We define S i as the variable of whether sensor i has red, green and blue neighbors within one hop distance, then 0, has red, green and blue neighbors 1, otherwise {S i } can be approximated as identical independent distributions for a random network whose size is large enough, therefore, can be denoted as the total number of sensors which are isolated by red, green or blue aggregation tree. Let d i denote the number of neighbors of sensor i, then the probability that i is isolated by the red aggregation tree is labeled as . Similarly, i is isolated by the green (blue) aggregation tree with the probability ( ). Let p i be the probability that note i is isolated by red sensors, green sensors or blue sensors, then: Since , we can get a lower bound of Φ(G) by applying Markov Inequality P(S≥1)≤E[S] = . That is: When the network is dense enough, i.e., d i is large, a small p i can be obtained. For example, assuming p r = p g = p b = 1/3, we can obtain the lower bound of Φ(G) which varies with the variation of d under the condition of the d-regular network according to Equation (9). It can be observed from Figure 3 that Φ(G) ≥ 0.95 for d = 10, therefore, the coverage of aggregation trees is perfect for dense networks from Equation (10).

Ciphertext Analysis
This is the most basic attack in WSNs. SEDA-ECC is robust to ciphertext analysis attack, because the elliptic curve cryptography-based encryption depends on the factorization of large integers. Other schemes are also robust to ciphertext analysis attacks.

Chosen Plaintext Attacks
SEDA-ECC is robust to chosen plaintext attacks, because its encryption relies on random numbers, and the ciphertext is probabilistic. Other schemes based on ECC can defend against this attack too. Wagner's cryptanalysis [23] has indicated that CDA might suffer from chosen plaintext attacks because of improper security parameters. However, the cost of proper parameter would render CDA infeasible to WSNs. Castelluccia et al.'s scheme is also robust to this attack, because its security is based on the indistinguishability property of a pseudorandom function, and the previous encryption keys cannot be used to deduce the present or subsequent encryption key.

Malleability
In the analysis of this attack, we give the example that the adversary wants to increase the measured data by 50. Since Castelluccia et al.'s scheme is based on modular addition, adversaries can add the value of plaintext trivially through adding a certain value to the corresponding ciphertext directly, so it suffers from this attack. For example, a ciphertext (m + K n ) mod M can be easily altered by ((m + 50) + K n ) mod M = (m + K n ) + 50 mod M. Other schemes can defend against this attack because they are based on either modular multiplication or ECC.

Unauthorized Aggregation
For asymmetric scheme, SEDA-ECC, BGN, EC-OU and TinyPEDS are based on ECC. If an aggregator needs to perform aggregation, it has to know curve information. Since the public key is preinstalled in sensors generally, adversary cannot perform unauthorized aggregation and falsify the aggregated count value of subtrees without compromising the sensors or aggregators. CDA and Castelluccia et al.'s scheme might suffer from this attack, because they require only modular addition, and unauthorized aggregation can be performed without any additional information.

Node Compromise Attacks
For asymmetric schemes, SEDA-ECC, BGN, EC-OU and TinyPEDS do not suffer from unauthorized decryption under compromised sensor node conditions, because an adversary cannot obtain the private key through a compromised sensor. However, except for SEDA-ECC, they cannot defend against unauthorized aggregation in a compromised aggregator situation. The compromised aggregator might arbitrarily increase the aggregated result by aggregating the same ciphertext repeatedly or decrease it by selective aggregation. After the aggregation process, the forged value is difficult to detect or remove by the BS. SEDA-ECC can prevent this attack targeting data integrity by constructing disjoint aggregation subtrees. It is impossible for attackers to alter the aggregated result M without changing the count value ζ because the aggregators do not know the base points and . If the aggregated result of one tree is different from the others, the BS will reject it and compute the final result from the others. Therefore, an attacker can successfully forge the aggregated result if and only if the forged aggregated results of two trees are the same. The probability of success is extremely small, because the security depends on the factorization of large integers.
We use the case study of SEDA-ECC in Section 4.7 to validate its ability of defending against this attack. Supposing the aggregation ciphertexts excluding C r , C g , and C b are C' r = M' r + 193 + R r , C' g = M' g + 190 + R g , and C' b = M' b + 191 + R b . If the red aggregator DA r is compromised, it can arbitrarily increase the aggregated result by aggregating the same ciphertext repeatedly. Supposing the compromised aggregator DA r intend to increase C r by aggregating C r1 20 times, then C r = 20C r1 + C r2 = 45 + 21 + 693 . Therefore, we can get the aggregation ciphertext results ℂ r = C' r + C r = (M' r + 45) + 214 + (R r + 693) , ℂ g = C' g + C g = (M' g + 9) + 192 + (R g + 5) , and ℂ b = C' b +C b = (M' b + 9) + 193 + (R b + 6) , respectively. When the aggregated count results (r r , ζ g , ζ b ) are extracted by computing the discrete logarithm of q 1 q 3 * (ℂ r ,ℂ g ,ℂ b ) to the base point = q 1 q 3 * , the forged result ℂ r can be easily identified and rejected by BS because the differences between ζ r and the other two are out of the threshold value Th, that is |ζ r − ζ g | = 22 > Th, and |ζ r -ζ b | = 21 > Th. For symmetric schemes, the inherent drawback of CDA and Castelluccia et al.'s schemes is that the encryption key is identical with the decryption key. Therefore, an adversary can decrypt the ciphertext once the sensor is compromised. In addition, because the CDA's key is shared by all sensors and BS, if any sensor is compromised, the whole system security is broken. Castelluccia et al.'s scheme suffers from a minor impact due to the fact its distinct key is shared by BS. Table 1 shows the security analysis comparisons for all schemes. It clearly shows that symmetric schemes are less secure than asymmetric ones, although they are more efficient in terms of communication and computation costs. Compared with other asymmetric schemes, SEDA-ECC is superior in defending against compromised node attacks because it can protect data integrity by constructing disjoint aggregation trees when the aggregators are compromised.

Performance Evaluation and Comparison
Generally, symmetric key-based homomorphic schemes are more efficient than asymmetric ones, however, the security of symmetric schemes is weaker than that of asymmetric ones. For the sake of fairness, the performance of SEDA-ECC is only compared with other three asymmetric key-based homomorphic encryption schemes. In this section, we first discuss the threshold value Th, then evaluate the computation overhead, communication cost, and the accuracy of SEDA-ECC, BGN, EC-OU and TinyPEDS. We conduct simulations using TinyOS 2.0 simulator (TOSSIM). The parameters are shown in Table 2, and the topology of nodes is depicted in Figure 4, where the transmission range of a sensor is 50 m, and the BS coordinate is (200,200).

Th Parameter Setting
In general, the more sensors that participate in the data aggregation, the larger the probability of constructing disjoint aggregation trees which have the same number of sensors. In addition, the aggregated count results ζ from three aggregation trees may not agree with each other exactly due to collisions and congestions in wireless channels. Therefore, an adjustable threshold value Th and the lowest bound of network size are introduced to accomodate these factors. Since whether the BS accepts the result depends on the threshold value Th, hence Th is an important parameter. In order to get Th, we did extensive simulations, where the number of nodes (network size) was varied from 300 to 1,200 in a 400 m × 400 m area. The difference value among aggregated count results from three aggregation subtrees is simulated 40 times, and the average value is depicted in Figure 5, where the "ideal" curve shows the aggregated result in an ideal situation. According to the simulation result, we notice that the differences, which are between 2 and 9, are very small. Hence, the threshold can be set as a small value, e.g., Th = 10. We can adjust Th if the network conditions are changed. Note that the average count result is only half of the ideal number and the difference extends to 9 when the network size is nearly 300. In addition, the smaller network size is, the larger differences became. As we analyzed in Section 5.1, it is because the coverage is bad enough in a sparse network to deteriorate the aggregation accuracy. Therefore, we set the lowest bound of network size as 300 in a 400 m × 400 m area to make our scheme available.

Communication Overhead
The number of exchanged messages in each scheme is the same. Though there are three subtrees need to be built in SEDA-ECC, similar to the other schemes, each node needs to send two messages for data aggregation: one HELLO message to form the aggregation tree, and the other message for data aggregation. Therefore, the communication overhead mainly depends on the ciphertext size of each scheme on the condition that the number of message sending to the BS is the same. Supposing the order of elliptic curve is N, SEDA-ECC's security relies on the hardness of factoring the order N. N is a product of several different large prime numbers, e.g., N = q 1 q 2 ⋯q k , where k is the number of prime numbers. If the length of prime number is 256-bit, there is no efficient approach to factor the product N [7]. Therefore, in SEDA-ECC, we generate N = q 1 q 2 q 3 , where the prime numbers q i are all 256-bit. Since the size of the ciphertext is almost the same as |N| + 1, the SEDA-ECC's ciphertext size is 3 × |q| + 1(|q| = 256-bit). EC-OU's ciphertext size is 3 × |q| + 2(|q| = 341-bit) according to [24]. BGN's ciphertext size is 1,025-bit, and TinyPEDS's ciphertext size is 328-bit according to [7]. Figure 6 shows the comparison of ciphertext sizes.

Computation Overhead
Since SEDA-ECC, BGN, EC-OU and TinyPEDS schemes are all built on elliptic curves, encryption and aggregation operation are based on point addition and point scalar multiplication. In elliptic curve arithmetic, point doubling and adding are two basic operations. Scalar multiplication can be accomplished by the half-and-add algorithm based on point doubling and adding [25]. It requires about |r| doubling and |r|/2 additions for computing r * , amounting to around 3|r|/2 point additions [18].
It should be noted that SEDA-ECC, BGN, EC-OU and TinyPEDS schemes are built on different mathematical foundations. We assume the finite field of elliptic curve is ℱ p , and the bit length of the finite field is |p|. BGN and EC-OU schemes are chosen over ℱ p (|p| = 1,024), TinyPEDS is chosen over ℱ p (|p| = 163), SEDA-ECC is chosen over ℱ p (|p| = 768). To achieve a fair comparison, we choose the point addition on 163-bit field as the base unit. For an elliptic curve computation over a finite field ℱ p , the cost of scalar multiplication can be converted to the number of computations (point addition on 163-bit) according to the scalar r and the size |p|. The comparison results are presented in Figure 7, where the length of messages is 16-bit, and the length of random nonces is 80-bit. In summary, TinyPED is the most efficient one for both communication overhead and computation cost, because its curves are chosen from the smaller field ℱ p (|p| = 163). TinyPED's security is based on the hardness of elliptic curve discrete logarithm problem, hence it can be built on a smaller field. However, BGN, EC-OU and SEDA-ECC are all based on the hardness of integer factorization problem, so their curves must be chosen from the larger field. It can also be observed from Figures 6 and 7 that SEDA-ECC outperforms BGN and EC-OU for both communication and computation performances. Furthermore, In terms of security, SEDA-ECC can defend against all attacks which are listed in Table 1, hence it is superior to the other schemes. Furthermore, the energy consumption of SEDA-ECC is evaluated in different sensor devices according to TinyECC [26], which is one well-known implementation of ECC for WSNs, as shown in Figure 8. The energy consumption can be significantly reduced with more advanced devices. Therefore, the secure data aggregation schemes based on asymmetric encryption, e.g., ECC, have extensive applications with the development of the advanced sensors.

Accuracy
We define the accuracy as the ratio between the aggregated sum result by the data aggregation scheme in use and the real sum of all sensors participating in the data aggregation. It is an important issue because it could affect the decision of the BS. All the schemes should achieve 100% accurate aggregated results in an ideal situation. However, data packets may be lost or delayed due to data collisions, processing delays and noisy wireless channels. We evaluate the data accuracy of SEDA-ECC, BGN, EC-OU, and TinyPED with respect to different time intervals, as shown in Figure 9. It shows that all these schemes almost perform equally in term of accuracy. We can observe that the accuracy increases as the time interval increases, because the data collisions and congestions between data aggregators are reduced, and the data packets should have enough time to be delivered.

Conclusions
Providing hierarchical data aggregation without losing data privacy and integrity guarantee is a challenging problem in WSNs. In this article, we propose a novel Secure-Enhanced Data Aggregation based on Elliptic Curve Cryptography (SEDA-ECC) for WSNs. SEDA-ECC divides the aggregation tree into three subtrees to reduce the importance of the high-level sensor nodes. It also generates three aggregated results by performing PH-based aggregations in the three subtrees, respectively, so that the BS could verify the subtree aggregated results by comparing the aggregated count value. Extensive analytical and simulation results indicate that SEDA-ECC can achieve the highest security level on the aggregated result comparing with other asymmetric schemes, and SEDA-ECC is efficient with respect to a reasonable energy cost.