Next Article in Journal
VitiCanopy: A Free Computer App to Estimate Canopy Vigor and Porosity for Grapevine
Previous Article in Journal
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization
Previous Article in Special Issue
A Hybrid Key Management Scheme for WSNs Based on PPBR and a Tree-Based Path Key Establishment Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Secure and Cost-Effective Distributed Aggregation for Mobile Sensor Networks

1
School of Information Science and Engineering, Central South University, Changsha 410083, China
2
School of Electronics and Information Engineering, Hunan University of Science and Engineering, 425199 Yongzhou, China
3
Faculty of Computer and Information Sciences, Hosei University, 184-8584 Tokyo, Japan
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2016, 16(4), 583; https://doi.org/10.3390/s16040583
Submission received: 13 December 2015 / Revised: 24 March 2016 / Accepted: 20 April 2016 / Published: 23 April 2016
(This article belongs to the Special Issue Mobile Sensor Computing: Theory and Applications)

Abstract

:
Secure data aggregation (SDA) schemes are widely used in distributed applications, such as mobile sensor networks, to reduce communication cost, prolong the network life cycle and provide security. However, most SDA are only suited for a single type of statistics (i.e., summation-based or comparison-based statistics) and are not applicable to obtaining multiple statistic results. Most SDA are also inefficient for dynamic networks. This paper presents multi-functional secure data aggregation (MFSDA), in which the mapping step and coding step are introduced to provide value-preserving and order-preserving and, later, to enable arbitrary statistics support in the same query. MFSDA is suited for dynamic networks because these active nodes can be counted directly from aggregation data. The proposed scheme is tolerant to many types of attacks. The network load of the proposed scheme is balanced, and no significant bottleneck exists. The MFSDA includes two versions: MFSDA-I and MFSDA-II. The first one can obtain accurate results, while the second one is a more generalized version that can significantly reduce network traffic at the expense of less accuracy loss.

1. Introduction

Wireless sensor networks and mobile sensor networks [1,2,3,4] have received unprecedented attention because of their exciting potential applications in military, industrial and civilian areas (e.g., environmental and habitat monitoring). Wireless communication is often used to transfer data among nodes in these networks, and most nodes are equipped with a battery as the energy unit, which means the energy capacity is limited. Generally, wireless transmission consumes much more energy than data processing. How to save the overall energy resources and extend the lifetime of networks is a popular research topic.
Data aggregation [5,6,7,8,9,10,11,12,13,14] is one of the most important solutions in minimizing the transmitted data size in large-scale wireless networks and is also one of the most important tasks in other distributed applications [15,16,17,18,19,20,21]. Data aggregation can be achieved via In-server or In-network aggregation. In-server aggregation, where data aggregation is performed directly at the server based on the raw data received from each client, is an energy cost approach in large-scale distributed systems. In-network aggregation (i.e., aggregating partial results at intermediate nodes along the routing path) significantly reduces the total communication cost and obtains load balance, especially when we only need the aggregation result instead of much raw data.
The data aggregation scheme also faces many security challenges. For example, wireless sensor networks are usually deployed in remote and hostile environments in military applications; thus, sensor nodes are prone to node compromise attacks, and security issues, such as data confidentiality and integrity, are extremely important. Wireless sensors are also being increasingly used to monitor/collect information in healthcare medical systems. It is important to effectively process the ever-growing healthcare data and simultaneously protect patients’ data privacy.
The traditional encryption technology is not suitable for secure data aggregation (SDA) because it only provides concealment and does not support cipher text operations. To realize in-network secure data aggregation based on traditional encryption technology, intermediate aggregators will have to decrypt the data received from all children before operating on them, and then, an encryption will be needed for the aggregated result before sending the message. Frequent encryption and decryption of data in intermediate nodes will increase the computing cost and energy consumption. Moreover, secret key management is more difficult, because the decryption key storage at an intermediate node can be easily obtained by an attacker.
To solve this problem, several secure data aggregation schemes have been proposed, such as synopsis diffusion-based [7,8,9], shuffling-based [6], and homomorphic encryption-based [10,11,12,13,14] data aggregation. Homomorphic encryption-based data aggregation, which has a better theoretical foundation, will be used in the proposed schemes. A comprehensive review on secure data aggregation protocols was presented by Ozdemir et al. [22].
Enabling multi-function support is also a challenge in the In-network data aggregation scheme. Statistics can be divided into two categories according to the aggregation functions used, i.e., summation-based and comparison-based. For example, aggregation operations, such as median computation or finding the maximum/minimum, rely exclusively on comparison operations. Moreover, aggregation operations, such as count, mean, variance or standard deviation (STD), rely on summation operations. Most of the existing secure data aggregation schemes can only obtain a single type of statistics. To obtain summation-based and comparison-based statistical results simultaneously is still an open problem.
Take homomorphic encryption (HE)-based SDA for example. In [10,11], an additive homomorphism property was used for aggregation on encrypted messages, so that summation-based statistics results, such as CNT (Count) and SUM , can be obtained by these algorithms. However, it could not obtain comparison-based statistics, such as MAX and MIN . Rivest et al. [23] noted that any privacy homomorphism is insecure, even against cipher text-only attacks, if it supports comparison operations. Acharya et al. [24] applied the order-preserving encryption [25] in SDA to obtain comparison-based statistics; however, summation-based statistics were not supported in the order-preserving encryption. Ertaul et al. [26] and Samanthula et al. [27] also only supported comparison-based statistics.
Multifunction is also important in other areas. For example, Lu et al. [28] presented a multi-function secure data aggregation scheme for smart grid communications. The Boneh–Goh–Nissim cryptosystem was adopted for data privacy, and only summation-based statistics (such as average and variance) were supported in the scheme.
More details regarding the functionality comparison results are listed in Table 1. The second and third columns indicate whether summation-based or comparison-based statistical results are supported, while the last column indicates whether all statistics can be derived from a single query. “P” means partly supported; “Y” and “N” are “yes” and “no”, respectively. The last row is the proposed scheme. More details regarding Table 1 and the proposed scheme will be given in the following sections.
To enable arbitrary aggregation operations on a server, RCDA (Recoverable Concealed Data Aggregation) [12] designed a scheme that can recover all sensing data, even data that have been aggregated. In RCDA, a homomorphic encryption algorithm is used to provide end-to-end confidentiality, and an encode step is used to enable recovery of all sensing data, which means that the scheme can achieve arbitrary method support. However, the final data are formed as concatenations of all sensing data, and no information compression method is used; thus, the communication cost is too heavy.
Nodes do not always remain active in dynamic networks. Some of them may sleep to save energy, and some of them may be dead due to energy exhaustion and other reasons. Most existing SDA schemes are not efficient for dynamic networks due to the varying number of active nodes. For example, [10,11] use extra communication costs to report the number of active (or inactive) nodes in each query, which cost considerable energy. These extra communication costs were even more than those used for sensing data when the percent of active (or inactive) nodes was large enough. The extra traffic increased dramatically along the router path, which easily formed a network bottleneck, reducing the overall network life cycle. More details are listed in Table 2.
In this paper, we propose two multi-functional secure data aggregation (MFSDA) schemes, i.e., MFSDA-I and MFSDA-II. Both of them can obtain addition-based and comparison-based statistics at the same query. The first one can obtain accurate results, while the second one is an approximate version that can significantly reduce communication cost and prolong the network life cycle at the expense of less accuracy loss. More specifically, to provide value-preserving and order-preserving during in-network aggregation and then enabling arbitrary statistics support, we introduce a mapping step and a coding step in the proposed scheme. A compressing step is introduced to further reduce the packet size in MFSDA-I.
The remaining part of this paper is organized as follows: Section 2 introduces the network model and the background knowledge. Section 3 and Section 4 introduce MFSDA-I and MFSDA-II, respectively. Section 5 presents functionality and security analysis. Section 6 presents performance analysis and evaluation. Sections 7 and 8 offer a summary and acknowledgment.

2. Preliminaries

In this section, we first introduce the network model and attack model. Then, we give the encryption and signature schemes used in the proposed schemes. The final subsection lists the basic notations.

2.1. Network Model

As shown in Figure 1, a cluster-based topology is used. It is composed of a server and a large number of clients/nodes. The nodes, selected as cluster heads (CH, e.g., H1, H2, etc.), are assumed to be trustworthy, which means that secret information can be stored if required. The remaining nodes (CM, cluster member, e.g., 1, 2, etc.) choose appropriate clusters to add themselves to according a certain criterion, such as signal strength in a wireless network and delay in a wired network. All CHs form a tree, with the server as its root. For the convenience of analysis, assume that only cluster members generate data.

2.2. Attack Model

Assume that the adversary is rational; which means that he or she will never expose himself or herself to obtain information; malicious destruction of nodes will never happen.
Adversaries know all public keys and other public parameters. The private data stored in CHs will have been destroyed before the adversary captures it.

2.2.1. Without Compromising any Nodes

If an adversary does not compromise any nodes, the adversary can still launch an attack in wired/wireless channel. Let us consider the following situations.
A1: 
Eavesdrop: The adversary can eavesdrop on any data transmission in a wired/wireless channel.
A2: 
Replay attack: An adversary can use historical data packets instead of an actual data packet, to interfere with the normal information acquisition.
A3: 
Data tampering: An adversary can modify the data packet and send to any nodes.
A4: 
DoS Attack: The forge massage will easily spread in end-to-end data aggregation technology due to the multi-hop flooding effect. When an adversary injects a forged message, the final aggregation result the server got is always wrong, thus forming a DoS attack.

2.2.2. Compromising CM

If an adversary has compromised one or more CM, it can obtain all of its secrets. An adversary can use this private information and other public information to modify or forge packets.
Generally, the local value of an honest node is bounded, and then, the compromised node can falsify its own sensor reading as follows:
B1: 
A compromised node falsifies the local value outside the bound.
B2: 
A compromised node falsifies the local value within the bound.

2.3. Encryption and Signature Scheme

Because elliptic curve cryptography (ECC) can reduce key size with high security, it provides us with calculating speed and computation cost. Both the encryption and signature scheme used here are ECC-based.

2.3.1. Homomorphic Encryption Scheme

The homomorphic encryption (HE) scheme is derived from homomorphism in abstract algebra. Using homomorphism, operations in one algebraic system (plaintext) can be mapped into an operation in another algebraic system (cipher text). The HE scheme has been widely applied in secure data aggregation [11,12,13,31].
An asymmetric HE scheme (Algorithm A1 in Appendix A), which is derived from the ElGamal encryption scheme (EC-EG) [32], will be used in MFSDA. Details are illustrated in Appendix A.
Based on the homomorphic property described in Theorem 1, the intermediate nodes perform data aggregation directly in cipher text. Without frequent encryption and decryption, the computation overhead is reduced, and the key management is easy.
Theorem 1. 
(Homomorphic property) Algorithm A1 has an additive homomorphic property, namely the summation arithmetic in plaintext is equivalent to summation arithmetic in cipher text, i.e.,
H E n c ( m 1 + m 2 ) = H E n c ( m 1 ) H E n c ( m 2 )
The proof refers to Appendix A.

2.3.2. Identity-Based Signature Scheme

In the identity-based signature (IBS) scheme, users’ public identity information (ID, email, etc.) is used as the public key for signature verification, which can effectively solve the problems in the management of PKI public-key certificates.
The algorithm, which is derived from [33], consists of four parts: setup, extract, signature and verify. Details are illustrated in Appendix B.

2.4. Basic Notation

Table 3 lists the notations that we will use later.

3. Multi-Functional Secure Data Aggregation Scheme: MFSDA-I

Two multi-functional secure data Aggregation schemes are proposed: MFSDA-I and MFSDA-II. MFSDA-I is given in this section, while MFSDA-II will be given in the next.

3.1. MFSDA-I

MFSDA-I consists of four parts (procedures): setup and operations on the three types of nodes, i.e., cluster member (CM, e.g., 1, 2, etc.), cluster head (CH, e.g., H1, H2, etc.) and server.
The setup procedure includes network initialization, as well as the initialization of encryption and signature. Both non-homomorphic and homomorphic encryption are used in the proposed scheme. The former is used in intra-cluster data transmission; the latter is used in inter-cluster data transmission. Intra-cluster encryption is the same as the one used in RCDA [12]. For Details regarding inter-cluster encryption, refer to Section 2.3.1. An IBS signature mechanism (see Section 2.3.2) is used for all packets.
Each CM encrypts the raw data with the non-homomorphic encryption mentioned above. Then, a timestamp and other information are attached to the cipher text before IBS signature application. The final packets sent to the cluster head (CH) consist of all of them.
Each CH first verifies the signature of all of the received packets. The received packets can be divided into two categories: one is intra-cluster data, which are received from the CM node within a cluster, and inter-cluster data, which are from other cluster heads.
For the intra-cluster data, the CH first decrypts the data and then maps and encodes it to obtain a vector. All vectors generated in the same cluster are added up in plaintext to obtain an intra-cluster data aggregation result. The aggregated vector will be encrypted using the homomorphic encryption scheme with the public key of the server.
For the inter-cluster data, because they have been encrypted using the homomorphic encryption scheme in other CH nodes, there is no need to also decrypt it. The CH will sum two types of cipher text above directly in the cipher text domain. The final packet sent to the parent node (cluster head or server) also contains a time stamp, other information and the signature generated from them using IBS with the private key of the cluster head.
All packets received by the server are from cluster heads. The server first verifies them, then extracts the encrypted data and aggregates them in the cipher domain. Finally, the server decrypts it to obtain the aggregation result of the whole network.
We can extract all common statistics from the aggregation result, which is in a vector form, using the functions provided in following section.

3.1.1. Setup

The setup procedure initializes the network topological structure, as well as initializes each encryption and signature mechanism. The following key pairs will be generated during the setup.

1. < I D k , K e y L H k > :

This type of key pair, which was used in RCDA [12], is used for intra-cluster data encryption, namely data transmitted between CM and its CH.
In a large-scale distributed system, the server does not know the cluster information before deployment, and the key preload scheme is infeasible. A key exchange scheme [12] is introduced to solve this challenge.
Each CM loads its own key pair, generated by the server. All CH will generate a span tree, with the server as its root. Each CM joins a suitable cluster based on a certain criterion (e.g., delay in a wired network and signal strength in a wireless network), encrypts the cluster choice with its private key and transmits it to the server. After decryption, the server will send the CM’s key pairs to its CH. Thus, the CH can obtain all of the key pairs of its CMs.

2. < I D k , K e y S i g n k > :

This type of key pair is used for the IBS signature mechanism (Section 2.3.2). To ensure the integrity of the data, all packets, including intra-cluster and inter-cluster, must be attached with a signature.
The identifier I D k of each node is also the public key for signature verification. The private key K e y S i g n k of each node is generated by the server using the master key. This key pair is preloaded into each node before deployment. Each packet contains a public key and a signature generated by the sender using its private key. The receiver will verify all of the packets before further processing.

3. < K e y P r i S , K e y P u b S > :

This key pair is used for homomorphic encryption (Section 2.3.1) to achieve inter-cluster data confidential, namely the data transmission between CHs or between the CH and the server. The public key of the server (i.e., K e y P u b S ) is used for data encryption at the CH. The server keeps the private key K e y P r i S for decryption. In the proposed scheme, all of the data contained in the inter-cluster packets are homomorphism encrypted, so data aggregation is performed directly without decryption.

3.1.2. Operations on CM

Operations on the CM are composed of three parts: encryption, signature and data transmission.
First, each CM encrypts the sensing data x k with the encryption key K e y L H k to obtain C k . Then, using the private key K e y S i g n k , the CM generates the signature from the transmitting data Msg, which consists of ciphertext, a timestamp and other information. Finally, the CM transmits Msg and S k to the CH.
  • Encryption: C k = e n c r y p t ( x k , K e y L H k ) .
  • Signature:
    S k = s i g n a t u r e M s g , K e y S i g n k M s g = { I D s r c , I D d e s t , t i m e s t a m p , m s g T y p e = t L H , C k }
I D s r c and I D d e s t are the identification (ID) of the CM and its cluster head (CH), respectively. A t i m e s t a m p is adopted to avoid message duplication and to process data by intervals. In addition, if an adversary attempts to modify the t i m e s t a m p without a valid private key for the signature, the receiver can detect it by a signature verification mechanism. There are two types of messages in the scheme, defined as m s g T y p e : one for intra-cluster communication, denoted as t L H ; another is used for inter-cluster communication, denoted as t H H . Here, the message type is the former, namely m s g T y p e = t L H .
3. 
Messages Send:
The final data packet sent by CM to its CH is
M s g S e n d = { M s g , S k }

3.1.3. Operations on CH

Operations on the CH consist of five parts: data receiving and verification, classification, operations on the intra-cluster dataset, operations on inter-cluster dataset and packet construction and transmission.
The CH will verify all received message M s g R e c and eliminate the one that fails. The verification process includes: check the legality of I D s r c and I D d e s t ; check the effectiveness of the timestamp; and verify the signature.
Legitimate messages will be divided into two categories, i.e., { i n t r a S e t } and { i n t e r S e t } , according to the m s g T y p e . { i n t r a S e t } contains the intra-cluster packets received from each CM in the cluster. { i n t e r S e t } contains the inter-cluster packets received from other CH as its children.
The process for the intra-cluster dataset { i n t r a S e t } consists of five steps: decryption, mapping, encoding, aggregation and re-encryption. “Mapping, encoding and aggregation”, which will be further explained in the following section, are the key to ensuring the simultaneous extraction of various types of statistics. The public key of the server (i.e., K e y P u b S ) is used in homomorphic encryption.
All of the messages contained in { i n t e r S e t } are encrypted by the public key of the server. The data aggregation of the dataset can be performed directly in the encrypted domain due to the homomorphic attribution of the encryption mechanism.
Messages in { i n t r a S e t } and { i n t e r S e t } , have been transformed into C A g g I n t r a and C A g g I n t e r at previous two steps. The CH firstly sums them up in the cipher text domain and attaches other information, such as the timestamp. Then, the CH generates a signature S k using the private key. Finally, the CH sends Msg and S k to the parent node.
  • Data receiving and verification:
    { M s g R e c } = { M s g S e n d }
    v e r i f i y ( M s g R e c , I D k )
  • Classification:
    { i n t r a S e t } = { M s g R e c | m s g T y p e = = t L H }
    { i n t e r S e t } = { M s g R e c | m s g T y p e = = t H H }
  • Operations on { i n t r a S e t }
    (1)
    Decryption: x k = d e c r y p t ( C k , k e y L H k )
    (2)
    Mapping: y k = f m ( x k )
    (3)
    Encoding: v k = f e ( y k ) , while v k { 0 , 1 } L
    (4)
    Aggregation: V j = k = 1 N j v k
    (5)
    Encryption:
    C A g g I n t r a = H E n c ( V j , K e y P u b S )
  • Operations on { i n t e r S e t } :
    C A g g I n t e r = m s g i i n t e r S e t C i
  • Signature and Send
    (1)
    Aggregation:
    C i = C A g g I n t r a + C A g g I n t e r
    (2)
    Signature:
    S i = s i g n a t u r e M s g , K e y S i g n i M s g = { I D s r c , I D d e s t , t i m e s t a m p , m s g T y p e = t H H , C i }
    (3)
    Messages send:
    The packet sent to the parent is:
    M s g S e n d = { M s g , S i }

3.1.4. Operations on the Server

To simplify the analysis, all of the children of the server are assumed to be the CH.
Operations on the server consist of four parts: data receiving and verification, aggregation, decryption and statistical results acquisition.
  • Data receiving and verification:
    Message receiving and verification is the same as that of the CH.
  • Aggregation:
    C S = m s g i i n t e r S e t C i
  • Decryption:
    V = H D e c ( C S , k e y P r i S )
  • Get the statistic result:
    • Each statistic result can be obtained directly from V using the following formulas.
    • CNT: C N T = i = 1 L n i
    • SUM:
      S U M ( x ) = k = 1 N x k = k = 1 N f m - 1 ( y k ) = i = 1 L f m - 1 ( i ) × n i
    • MEAN: M E A N = S U M ( x ) C N T
    • VAR: V A R = E ( x 2 ) - E ( x ) 2
    where
    E ( x ) = M E A N E ( x 2 ) = S U M ( x 2 ) C N T S U M ( x 2 ) = k = 1 N x k 2 = k = 1 N ( f m - 1 ( y k ) ) 2 = i = 1 L ( f m - 1 ( i ) ) 2 × n i
    • STD: S T D = V A R ( x )
    • MAX: M A X = f m - 1 ( i max )
    where
    i max = max { i | i ( 0 , L ] & & n i > 0 }
    • MIN: M I N = f m - 1 ( i min )
    where
    i min = min { i | i ( 0 , L ] & & n i > 0 }

3.2. Mapping, Encoding and Aggregation

In this subsection, we first present details regarding mapping and encoding and then introduce the aggregation step.

3.2.1. Details of Mapping and Encoding

As mentioned above, statistics results can be divided into two types: addition-based statistics (such as SUM, AVG, VAR, etc.) and comparison-based statistics (such as MAX, MIN, etc.). The former requires value-preserving, while the latter requires order-preserving.
Obtaining the statistical results of many types effectively using only one query is difficult because it is hard to keep value-preserving and order-preserving attributions simultaneously during the aggregation of encrypted data.
In the proposed scheme, we can maintain the two types of information mentioned previously in the secure data aggregation process by using mapping and encoding.
As shown in Figure 2, there are three types of data in the proposed scheme: sensing data x, mapped data y and encoded data v . Mutual conversion can be performed through the mapping, encoding and inversion of them.
Sensing data x k are the original data gathered at node k. x k belong to a subset of real domain, i.e., x k ( X L B , X U B ] , where X L B and X U B are the lower and upper bounds, respectively.
x k is transformed into mapped data y k using the mapping step. y k belong to a subset of the natural numbers, i.e., y k ( 0 , L ] , where L = X U B - X L B a , and a is the accuracy requirement of x k .
The conversion between x k and y k is achieve by the mapping function and its inverse, i.e., f m and f m - 1 .
y k = f m ( x k ) = x k - X L B a x k = f m - 1 ( y k ) = a × y k + X L B
y k is converted into v k by encoding. v k is a vector, v k { 0 , 1 } L ; its elements’ number is L. The element of ( y k ) -th is one; other elements are zero; that is to say:
v k ( i ) = 1 ( i = y k ) 0 ( i y k )
The conversion between y k and v k is achieve by the encoding function and its inverse, i.e., f e and f e - 1 .
v k = f e ( y k ) y k = f e - 1 ( v k )

3.2.2. Aggregation

Aggregation is performed on the plaintext or cipher text of v k . An example for the former is the 3-(4)th step in Section 3.1.3; an example for the latter is the fourth step in Section 3.1.3 and the second step in Section 3.1.4. The following theorem proves that the server could finally obtain the aggregated result of all sensing data of active nodes.
Theorem 2: The homomorphic encryption used in the proposed scheme can ensure that the vector obtained at the server after decryption is the aggregated result of all encoded data of active nodes.
For the proof, refer to Appendix C
Despite that the elements of v k can only be zero or one, but usually the vector elements of the final aggregated, result V = v k may be arbitrary natural numbers that are not greater than N.

3.3. A Concrete Example

Figure 1 is the network model of this example. Assume that x k ( 20 , 25 ] and that the accuracy requirement is a = 1 . Sensing data gathered from each node are listed in the second column in Table 4. The third and fourth columns present mapped data y k and encoded data v k , respectively.
Each CM encrypts the sensing data, attaches the signature and other information and transfers these to the cluster head (CH).
The CH verifies the received message and obtains the sensing data via decryption. Mapped data y k ( y k ( 0 , 5 ] ) are transformed from x k using the mapping function.
Because the sensing data of Nodes 3 and 5 are outside of the valid data range ( 20 , 25 ] , the CH node will regard them as illegal data and discard them.
Other valid mapped data are encoded as a vector v k whose length is L. The y k -th element is one, while all of the remaining elements are set to zero. Let us take Node 1 as an example; the sensing data are x 1 = 23 ; mapped data y k = 3 are obtained after the mapping step; and then, the third element of the vector is set to one; while other elements are zero, i.e., v k = ( 0 0 1 0 0 ) . The values of each step are listed in Table 2.
For all of the intra-cluster data, the CH encodes them first, then sums and encrypts the summation using the server public key of homomorphic encryption. For all of the inter-cluster data, the CH adds them up directly in the cipher text domain after verification. Both the intra-cluster and inter-cluster data are homomorphic encrypted data in the server public key; thus, they can be summed up directly in the cipher text domain, which is equivalent to getting the cipher text of the regional data aggregation result. Finally, this regional result is transmitted to the parents, attached with other information, such as time stamp, signature, etc.
According to the homomorphic property, the summation arithmetic of vectors in the cipher text domain is equivalent to that in plaintext. Therefore, the server can obtain the final aggregation result V = v k by decrypting the received data. For example, in this case, the final data the server obtains are the summation of the vector in the fourth column of Table 4.
V = v k = ( 0 1 3 1 1 )
Each statistic can be calculated directly from V .
C N T = i = 1 L = 5 n i = 6 S U M ( x ) = i = 1 L = 5 ( i + 20 ) × n i = 21 × 0 + 22 × 1 + 23 × 3 + 24 × 1 + 25 × 1 = 140 M E A N = S U M ( x ) / C N T = 140 / 6 23 . 33 i m a x = max ( { i | i ( 0 , L ] & & n i > 0 } ) = 5 i m i n = min ( { i | i ( 0 , L ] & & n i > 0 } ) = 2 M a x = f m - 1 ( i m a x ) = 25 M i n = f m - 1 ( i m a x ) = 22
S U M ( x 2 ) = i = 1 L = 5 ( i + 30 ) 2 × n i = 21 2 × 0 + 22 2 × 1 + 23 2 × 3 + 24 2 × 1 + 25 2 × 1 = 3272 E ( x ) = M E A N E ( x 2 ) = S U M ( x 2 ) / C N T = 545 . 3333 V A R = E ( x 2 ) - E ( x ) 2 0 . 89 S T D = V A R 0 . 94

4. Multi-Functional Secure Data Aggregation Scheme: MFSDA-II

On the one hand, we can make a decision without accurate statistical results in most WSN applications; on the other hand, we can acquire performance improvement (such as reducing the amount of the reduction of energy consumption, communication, etc.) by reducing the accuracy requirement. Therefore, a large number of approximation algorithms have been proposed [7,8,9,29,34,35,36].
According to the analysis above, we know that the total data transmission is still large while L is large. Thus, we propose an approximation scheme in which a data compression is introduced to reduce the total data transmission and prolong the network life cycle.

4.1. MFSDA-II

To reduce the communication cost, a compression step is introduced after the mapping step in MFSDA-II. Mapping data y are compressed from a larger space with a size of L into a smaller space with a size of L . The encoding step is executed on compression data z, which makes the vector length decrease from L to L .
Mapping: The mapping step is the same as the one in MFSDA-I. The mapping function and its inverse function are as follows:
y k = f m ( x k ) x k = f m - 1 ( y k )
Compressing: The compression function f c will convert y k into z k as follows, where c is the compression factor.
z k = f c ( y k ) = y k c = f m ( x k ) c
One can recover y ^ k as an estimate of y k , using the following decompressing function on z k :
y ^ k = f c - 1 ( z k ) = c × z k - c 2
Encoding: The encoding step is based on z k instead of y k , i.e., v k = f e ( z k ) , while v k { 0 , 1 } L , and:
v k ( i ) = 1 ( i = z k ) 0 ( i z k )
We can recover v k from y ^ k , for that z k and v k are equivalent to each other.
y ^ k = f c - 1 ( z k ) = f c - 1 ( f e - 1 ( v k ) ) = f c - 1 ( i ) = c × i - c 2
where i is the subscript of nonzero elements in v k .
Aggregation:
V = k = 1 N v k

4.2. Get the Statistical Result

The aggregation vector of all active nodes obtained at the server is V = ( n 1 n 2 . . n L ) . Then, we can recover all statistics results from V using the following formulas.
  • Count (CNT): C N T = i = 1 L n i
  • Summation (SUM): S U M ( x ^ ) = k = 1 N x ^ k = k = 1 N f m - 1 ( y ^ k ) = k = 1 N f m - 1 ( f c - 1 ( z k ) ) = i = 1 L n i f m - 1 ( f c - 1 ( i ) )
  • Average/Mean (MEAN): M E A N = S U M ( x ^ ) C N T
  • Variance (VAR): V A R ( x ^ ) = E ( x ^ 2 ) - E ( x ^ ) 2
where
E ( x ^ ) = M E A N ; E ( x ^ 2 ) = S U M ( x ^ 2 ) / C N T ; S U M ( x ^ 2 ) = k = 1 N x ^ k 2 = k = 1 N ( f m - 1 ( y ^ k ) ) 2 = k = 1 N f m - 1 ( f c - 1 ( z k ) ) 2 = i = 1 L n i ( f m - 1 ( f c - 1 ( i ) ) ) 2
  • Standard Deviation (STD): S T D = V A R ( x ^ )
  • Maximum (MAX): M A X = f m - 1 ( f c - 1 ( i max ) )
where:
i max = max { i : i ( 0 , L ] & & n i > 0 }
  • Minimum (MIN): M I N = f m - 1 ( f c - 1 ( i min ) )
where:
i min = min { i : i ( 0 , L ] & & n i > 0 }

4.3. An Example of MFSDA-II

Let us illustrate the variant scheme with an example. Assume that the range of sensor data x k is x k ( 9 . 0 , 11 . 0 ] . After error data are detected and eliminated, seven effective perception data are left, i.e.,
x k | k = 1 7 = { 9 . 7 , 9 . 9 , 9 . 3 , 9 . 4 , 10 . 2 , 10 . 5 , 10 . 9 }
Mapped data y k ( 0 , 20 ] can be obtained by using the mapping function as follows.
y k = f m ( x k ) = ( x k - 9 ) × 10
Additionally, the corresponding mapped data are:
y k | k = 1 7 = f m ( x k ) | k = 1 7 = { 7 , 9 , 3 , 4 , 12 , 15 , 19 }
V 0 = v k 0 = ( 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 )
In MFSDA-II, mapped data have been compressed into compressed data z k ( 0 , 5 ] before the encoding step, i.e.,
z k | k = 1 7 = f m ( x k ) c = { 2 , 3 , 1 , 1 , 3 , 4 , 5 } c = 4
The encoding step is based on compressed data, so the vector length is reduced from 20 to five. The final aggregation vector is:
V = v k = ( 2 1 2 1 1 )
The total communication cost is only 1 4 of that in MFSDA-I. The transmission reduction is achieved via loss compression; thus, error exists. The final errors of each statistic require further analysis due to the positive and negative errors existing simultaneously.
We can obtain the common statistics from the final aggregated vector using the formula above. For the detailed calculation processes, please refer to Appendix D.

5. Functionality and Security Analysis

In this section, we will analyze the functionality and security of the proposed scheme and compare it to other conventional schemes. More specially, the properties, such as multi-function support, dynamic network adaption and security, will be discussed.

5.1. Functionality Comparison

The functionality comparison results are listed in Table 1. The second column indicates whether summation-based statistics are supported, while the third indicates whether a comparison-based statistics are supported. The last column indicates whether all statistics can be derived from a single query.
Most of these secure data aggregation (SDA) schemes, such as [9,11], only supported addition-based statistics. Acharya et al. [24] only supported comparison-based statistical results. In all of these SDA schemes, different statistics are derived from different queries.
Moreover, even a single statistic may also require several different queries. For example, to obtain VAR in CDA [11], we need at least one CNT query and two SUM queries, i.e., S U M ( x ) and S U M ( x 2 ) , and obtain VAR by S U M ( x 2 ) C N T - ( S U M ( x ) C N T ) 2 .
Both RCDA and the proposed scheme (MFSDA) can obtain all of the common statistical results simultaneously in a single query. However, RCDA has several flaws [13,31] with respect to security and communication costs. A solution for the security weaknesses of RCDA has been discussed in Sen-SDA [31]. The main contribution of Sen-SDA is to improve the efficiency of multiple signature verifications, which is not the same as in this paper; thus, we do not choose it as a candidate for comparison. EERCDA (Energy Efficient Recoverable Concealed Data Aggregation) [13] uses a differential data transfer method to reduce the energy consumption in RCDA. For further comparisons, refer to Section 6.

5.2. Dynamic Networks Adaptive

Nodes do not always remain active in dynamic networks [8]. Most existing SDA schemes are not efficient for dynamic networks due to the varying number of active nodes. To report abnormal nodes, extra communication costs were used in [10,11], which required much energy and easily formed a network bottleneck. These extra communication costs were even more than those used for sensing data when the percent of active (or inactive) nodes was large enough.
More details regarding comparisons are listed in Table 2. The second column indicates that dynamic networks are not supported, while the third and fourth columns indicate that dynamic networks can be supported with or without extra communication costs, respectively.

5.3. Security Analysis

Security is one of the most important properties of the secure data aggregation (SDA) scheme. In this subsection, we will analyze the security of MFSDA and compare it to other well-known SDA schemes.
The comparisons results are listed in Table 5. The attack models used in the table header are defined as follows.

1. Without compromising any nodes:

A1: 
Eavesdrop. The privacy of data is not affected by passive monitoring because sensing data have been encrypted in this scheme.
A2: 
Replay attack. The lifetime of each packet is marked by a timestamp in the proposed scheme. If an adversary attempts to modify the timestamp without a valid private key for the signature, the receiver can detect it via the signature verification mechanism.
A3: 
Data tampering. Data tampering can be detected by using the signature mechanism.
A4: 
DoS attack. The parent node can find the illegal data in time; thus, the multi-hop flooding effect hardly spreads.

2. Compromising CM:

If an adversary has compromised one or more CM, it can obtain all of their secrets. An adversary can use this private information and other public information to modify or forge packets. Generally, the local value of an honest node is bounded, and then, the adversary can falsify the sensor reading of the compromised node as follows:
B1: 
A compromised node falsifies the local value outside the bound.
B2: 
A compromised node falsifies the local value within the bound.

6. Performance Analysis and Evaluation

In this section, we will analyze the performance of the two schemes proposed in this paper, namely MFSDA-I and MFSDA-II. The former provides an exact result, while the latter can significantly reduce network traffic at the expense of less accuracy loss.

6.1. Evaluation Settings

A cluster-based network model is used here. Cluster heads are selected in advance and assumed to be trusted nodes. For smaller scale networks, cluster heads can communicate directly with the server. For a relatively large-scale network, all cluster heads form a tree with the server as its root.
Since network construction and maintenance is not the research focus of this article. We have tried to weaken the impact of the problem with reasonable simplification. In the proposed schemes, the cluster head can be pre-selected. Because of their small numbers, they can be equipped with more batteries. Therefore, these cluster head can be kept active during the lifetime of the WSN or kept active within a predetermined period for data receiving.
To simplify the analysis, the position of each cluster head is assumed to never change, and the tree structure among these cluster heads is assumed to be predetermined. This is feasible due to the limited number of cluster heads. Each cluster head broadcasts its own cluster formation requests periodically.
The interval of these cluster formation requests can be predefined based on the possible maximum move speed of the network nodes. For example, if the position of the node changes slowly or never changes, then it is necessary to enlarge this interval to reduce energy consumption. Other nodes join a cluster based on the signal strength of each cluster head. If more than one cluster head has the same signal strength, the cluster head with a low ID will be selected. More details regarding the network construction will be omitted, as this is not the focus of this article.
The dataset is obtained from the TAO (Tropical Atmosphere Ocean) project [37] of NOAA (National Oceanic and Atmospheric Administration). The TAO project enabled real-time collection of high quality oceanographic and surface meteorological data for monitoring, forecasting and understanding of climate swings associated with El Nino and La Nina. More detail of the dataset will be given later.
There are three datasets with different distribution used here. The first one is a uniform distribution dataset generated by "unifrnd" function in Matlab ; the second one is a Poisson distribution dataset, generated by "poissrnd" function in Matlab (with Lambda = 200); and the last one is an actual dataset from wind direction of the TAO project. More detail of the real dataset is given in Table 6. It contains 2000 samples, selected from a continuous time interval (1992–1993) with invalid data removed. The measure range of wind direction is [0, 360); the resolution is 1.4. The accuracy is 5–7.8, which is much greater than the resolution. Here, we choose the highest accuracy, so L = 360 - 0 5 = 72 . For comparison, the other two datasets also have the same N and L, i.e., N = 2000 and L = 72.

6.2. Analysis and Evaluations of MFSDA-I

Both theoretic analysis and experimental evaluation of MFSDA-I will be given in this section. As we discussed earlier, the concerned topic of RCDA is similar to the one of this paper, and EERCDA is an improve scheme of RCDA regarding the reduction of communication costs. Both of them will be chosen as candidates.

6.2.1. Communication Cost of MFSDA-I

Communication cost can be measured by considering the package size. In the proposed scheme, data transmission can be divided into two categories.
One is intra-cluster data transmission, namely data transmission between CM and CH. The package size is D L 11 = | h e a d e r | + | I D s r c | + | I D d e s t | + | t i m e s t a m p | + | m s g T y p e | + | C 11 | + | S | .
The other is inter-cluster data transmission, namely, data transmission between CHs or data transmission between the CH and the server. The package size is D L 12 = | h e a d e r | + | I D s r c | + | I D d e s t | + | t i m e s t a m p | + | m s g T y p e | + | C 12 | + | S | .
The sensing data do not need to be coded in intra-cluster data transmission; thus, the length of plaintext corresponding to C 11 is l o g 2 L .
Data will be encoded before inter-cluster data transmission. Because the number of elements in the encoded vector is L and the length of each element is at least l o g 2 N , so the total plaintext length is L l o g 2 N .
Let us assume that the ratio of the ciphertext and plaintext length in both stages is linear and the ratio α 1 and α 2 , respectively. Then:
D L 11 = k 1 + α 1 l o g 2 L D L 12 = k 1 + α 2 L l o g 2 N
where k 1 = | h e a d e r | + | I D s r c | + | I D d e s t | + | t i m e s t a m p | + | m s g T y p e | + | S | .
For a uniform distribution, the inter-cluster data transmission can be further reduced to:
D L 12 u = k 1 + α 2 L l o g 2 N L

6.2.2. Communication Cost of RCDA

In the first step of RCDA-HETE, namely intra-cluster data transmission, the packet size is D L 21 = | h e a d e r | + | C 21 | + | S | = k 2 + | C 21 | .
Note that, although the packet sizes of MFSDA, RCDA and EERCDA are different, because all of them use the same signature, | S | can still be the same, as long as the appropriate elliptic curve and parameters are chosen. If we choose the same test platform, the | h e a d e r | will also be the same. The encryption algorithm is consistent with MFSDA in the same stage, so:
D L 21 = k 2 + α 1 l o g 2 L
In the second stage of RCDA-HETE and all stages of RCDA-HOMO, the packet size is D L 22 = k 2 + | C 22 | .
The aggregation of sensing data is the concatenation of all messages from CM. The data length of each sensing datum is at least l o g 2 L , so the total plaintext size is at least N l o g 2 L ; the encryption algorithm is the same as the one used in the second stage of MFSDA, so the packet size is:
D L 22 = k 2 + α 2 N l o g 2 L
α 2 > α 1 1 , so D L 22 > D L 21 , which means that RCDA-HETE is much more efficient than RCDA-HOMO in terms of communication cost.

6.2.3. Communication Cost of EERCDA

To reduce the energy consumption of message transmission in RCDA [12] and to achieve more energy and bandwidth efficiency, EERCDA [13] uses a differential data transfer method. In EERCDA, the difference data, rather than raw data from the sensor node, are transmitted to the cluster head.
EE-RCDA includes two data transmission phase: reference data transfer session and subsequent data transfer session.
The first stage is the same as RCDA-HOMO. Each senor transmits raw data (reference data) to the server; thus, the packet size in this stage is still:
D L 31 = k 2 + α 2 N l o g 2 L = D L 22
The differential data are transmitted to cluster head in the second stage. Then, the server can recover the raw data using reference data and differential data. The total packet size is D L 32 = | h e a d e r | + | I D | + | C 32 | + | S | = k 2 + | I D | + | C 32 | .
Assume that the number of nodes whose data has changed is ęÂN at each query in this stages, then:
D L 32 = k 2 + β N log N + α 2 β N log L
Let us compare EERCDA with RCDA-HOMO first. EERCDA is much more efficient than RCDA-HOMO, if D L 32 < D L 31 , which means β < α 2 l o g 2 L l o g 2 N + α 2 l o g 2 L .
Now, let us compare EERCDA with RCDA-HETE. Because D L 22 = D L 31 , we only need to compare D L 21 to D L 32 . In general, β N 1 , α 1 < α 2 , so: k 2 + α 1 l o g 2 L k 2 + α 2 β N l o g 2 L k 2 + β N l o g 2 N + α 2 β N l o g 2 L .
That is to say, D L 21 D L 32 , which also means EERCDA will consume more energy than RCDA-HETE.

6.2.4. Evaluation of MFSDA-I

According to the analysis above, the communication cost of EERCDA is much greater than that of RCDA-HETE, while only in specific conditions is it much more efficient than RCDA-HOMO. To highlight the advantages of this scheme, we compare MFSDA to RCDA-HETE.
| I D s r c | + | I D d e s t | + | t i m e s t a m p | in MFSDA is used to achieve a much higher security level. The bits will also be needed by RCDA and EERCDA, if they want to achieve similar security property. At the same time, the packet header may contain I D d e s t , timestamp, etc. in platforms, such as TinyOS 2.x. Therefore, it can be ignored during the comparison. | m s g T y p e | only need one bit, so can also be ignored. Therefore, there is no need to consider k 1 and k 2 during the comparison. Therefore, D L 21 D L 11 , that is to say, MFSDA and RCDA-HETE have a similar communication cost in the first stage.
When N > L , D L 12 - D L 22 α 2 ( L l o g 2 N - N l o g 2 L ) < 0 . That is, MFSDA is more efficient than RCDA-HETE when N > L . Obviously, MFSDA is also more efficient than RCDA-HOMO and EERCDA in this condition. A case of communication cost comparison between RCDA and MFSDA is given in Figure 3.
Now, let us compare RCDA to MFSDA-I on wind direction, which is a real dataset obtained from the TAO project. According to the comparison result, MFSDA-II can reduce the communication cost dramatically at the cost of less accuracy loss. The measure rang TAO wind direction is [0,359], and the accuracy is five. When N = 100, the data length of RCDA is 700 bits, while the data length of MFSDA-I is 504 bits. The latter one is only 72% of the former one. When N = 300, the data length of RCDA and MFSDA-I are 2100 and 648, respectively. The data length of MFSDA-I has been reduced to 31% of RCDA. More detail is listed in Table 7.

6.3. Analysis and Evaluations of MFSDA-II

For MFSDA-II, we focus on the influence of the compression factor on communication cost and accuracy.

6.3.1. Comparison with MFSDA-I

The data transmission of the intra-cluster, i.e., data transmission from CM to CH, in the approximate scheme is the same as that in MFSDA-I. The compression step only has influence on inter-cluster data transmission, i.e., data transmission between CHs or from CH to the server.
Because of the compression step, the item numbers of v k and V are reduced from L to L = L c , so the total data transmission will be reduced, as well. Then,
D L 12 = k 1 + α 2 L × l o g 2 N = k 1 + α 2 L c l o g 2 N
For a uniform distribution, the result above can be further reduced to:
D L 12 u = k 1 + α 2 L c l o g 2 c N L
The relationship between communication cost and the compression factor is shown in Figure 4. The communication cost significantly decreases as the compression factor c increases in MFSDA-II. Error is introduced due to the loss of compression; accuracy analysis will be given next.

6.3.2. Accuracy Analysis of MFSDA-II

Due to the use of the loss compression operation, this communication cost reduction will inevitably introduce errors. The final error may be modest due to the positive and negative errors existing simultaneously and offsetting each other.
The error rate of each statistic is in inverse proportion to or has a reverse trend with L and is proportional to or has a positive trend with c. For the same error rate request, the bigger the L, the bigger the value of maximum acceptable c. For the error rate of comparison-based statistics, such as ER(Error)MAX , ERMED and ERMIN, the reference boundaries are the determined ones, i.e., the error rate is not beyond the borders. For the error rate of addition-based statistics, except for ERCNT, which is zero, the error rates of several other statistics constitute the reference boundary; the boundary is also used to describe the error variation trend and reference boundary range, and several specific case may out of the reference bounds. The reference boundaries of ERMAX, ERMED are ± c 2 L , i.e., the boundary is directly related to ± c and shockingly enlarges as c increases. The upper bound of ERMIN, ERMAX and ERMED is the same. Due to rounding used in the compression step, ERMIN has a one-way increasing trend, and its lower bound is zero.
The CNT is the total active nodes number in the current query; thus, no error exists, namely ERCNT is zero. Therefore, the error reference bounds of ERMEAN and ERSUM are the same. Their lower bound is independent of c. Their upper bound is proportional to c 2 and is shockingly amplified as c increases.
The error reference bounds of ERVAR and ERSTD are also shockingly amplified as c increases, more so for ERVAR.
Compared to the comparison-based statistics, the coefficients of the addition-based statistics (such as ERVAR, ERMEAN, etc.) are c L times the former ones. c is much less than L; thus, c L is much smaller than one. Therefore, the error rate of comparison-based statistics is much greater than that of addition-based statistics. This is mainly because the addition-based statistics are calculated by using the data of all nodes, even though the data error of a single node is large; partial offset of the negative error and positive error exists simultaneously. While the comparison-based statistics are obtained from a single point of data, there is no such type of compensation.

6.3.3. Evaluation of MFSDA-II

The statistical results obtained by MFSDA-I are accurate; thus, the accuracy evaluation is only performed on the MFSDA-II. The following experiments will analyze the error rate of each statistic as it changes with the increase of the compression factor. The results are listed in Figure 5.
Now, let us compared MFSDA-I with MFSDA-II based on the wind direction dataset of TAO, where the network size is N = 2000. The data length of MFSDA-I is 792 bits. When the compression factor is c = 2, the data length of MFSDA-II is 396 bits, which is only 50% of MFSDA-I. The maximum error rate of all statistical results is 2%, while the minimum error rate is 0.2%. When the compression factor is c = 4, the data length of MFSDA-II is 198 bits, which is only 25% of MFSDA-I. The maximum error rate of all statistical results is 2%, while the minimum error rate is 0.5%. More detail of the comparison result is listed in Table 8. More detailed accuracy analysis results of MFSDA-II will be given later.
The reference boundary is derived based on a uniform distribution. The more the actual data are similar to a uniform distribution, the higher the reference value that the boundaries will provide. As shown in Figure 5, the majority of error rates are in the reference boundary, though some statistical error rates are still beyond the reference boundary.
The majority of error rates of the unifrnd dataset are in the reference boundary. Compared to the other dataset, it can achieve higher accuracy (less error rate) for the same compression factor. The error rate reference formula of each statistic has a good prediction ability for the unifrnd dataset; thus, it can be used directly for compression factor choosing. For example, assuming that the accuracy requirement of ERSTD is approximately 2%, we can choose C = 15; the amount of data is compressed from L = 72 to L c = 72 15 = 4 . 8 , which means that the final communication cost is only 1 15 = 6 % of that in MFSDA-I.
For the non-uniform distribution, some statistic error rates may be beyond the reference boundary. ERSTD and ERVAR of poissrnd dataset are beyond the reference boundary, so the reference error formula cannot be used directly. However, even in this condition, the error rate is still related to c. By the choice of a much less C, we can still greatly reduce the amount of communication cost and achieve high precision. For example, the ERVAR of poissrnd dataset reaches to 10% when c = 15. If we choose c = 10, the ERVAR will reduce to about 4%. The communication cost is reduced to 1 10 = 10 % of that in MFSDA-I.
As show in Figure 5, ERMAX, ERMIN and ERMED of all three datasets are in the reference boundary range; this is because the reference boundary of comparison-based statistics is the absolute boundary.
As the TAO dataset of wind direction is very similar to a uniform distribution, the greatest error rate is in the boundary range. In contrast, poissrnd dataset is less similar to a uniform distribution. If we calculate a suitable compression factor according the error formula, we can directly apply it to the TAO dataset of wind direction, and we may need to choose a much smaller c for the poissrnd dataset.
According the reference boundary formula, the error rate of each statistic is directly proportional to C and inversely proportional to L. That is to say, for the same accuracy requirement (error rate) of each statistic, the bigger the L, the largest the c. Due to the reduction of communication cost in MFSDA-II being L L = C , therefore, the bigger the L, the more the energy saved.
For example, in the barometric pressure of the TAO project, L = 1100 - 800 0 . 1 = 3000 , still N = 2000. The experimental results show that, when c = 30, ERVAR is about 20%, and ERSTD is about 10%. The rest of the4 error rates are all less than 1%. L’ is reduced from 3000 to 100. Therefore, the amount of communication cost is only 1 30 of MFSDA-I. Even if we choose a much smaller c = 20, to reduced ERVAR to 10%, the cost is still only 1 20 of MFSDA-I.

7. Conclusions

This paper presents two cost-effective multi-functional secure data aggregation schemes (i.e., MFSDA-I and MFSDA-II) for distributed systems, such as a mobile sensor network. Both schemes can provide value-preserving and order-preserving during in-network aggregation and, thus, obtain multiple statistics (both addition-based and comparison-based statistics) in a query. The MFSDA-I scheme can obtain accurate results, while the MFSDA-II scheme can obtain an approximate result with much less energy consumption. Analysis and evaluation results show that MFSDA scheme has great performance with respect to functionality, security and cost effectiveness.

Acknowledgments

This work is supported by the Major Science and Technology Research Program for Strategic Emerging Industry of Hunan (2012GK4106), International Science and Technology Cooperation Special Projects of China (2013DFB10070), Foundation of Hunan Educational Committee (14C0484), Yongzhou Science and Technology Plan ([2013]3), Foundation of Hunan University of Science and Engineering (13XKYTA003), Hunan Science and Technology Plan (2012RS4054), Natural Science Foundation of China (61202341), Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Innovation Fund (JYB201502) and the Project of Innovation-driven Plan in Central South University (2015CXS010). The authors declare that they have no conflict of interests.

Author Contributions

Kehua Guo and Ping Zhang conceived and designed the experiments; Jianhua Ma performed the experiments; Ping Zhang and Jianhua Ma analyzed the data; Kehua Guo, Ping Zhang and Jianhua Ma wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Homomorphic Encryption Scheme

The proposed scheme uses an EC-ElGamal-based homomorphic encryption algorithm derived from [32]. The algorithm consists of four procedures: setup, KeyGen, encryption and decryption. Details are illustrated in Algorithm A1.
Algorithm 1: Homomorphic encryption scheme.
1:
procedureSetup              ▹ Initialization
2:
    p is a prime number
3:
    E is an elliptic curve over a finite field F p ,
4:
    while G is the generator.
5:
    return p a r a = < E , p , G >
6:
end procedure
7:
procedure KeyGen            ▹ Key Generation
8:
     K e y P r i B S F p
9:
     K e y P u b B S K e y P r i B S × G
10:
    return k e y = < K e y P r i B S , K e y P u b B S >
11:
end procedure
12:
procedure HEnc( m , K e y P u b B S )      ▹ Encryption
13:
     M m a p ( m )
14:
     R r × G , w h e r e r R F p
15:
     S M + r × K e y P u b B S
16:
    return C m = < R , S >
17:
end procedure
18:
procedure HDec( C m , K e y P r i B S )      ▹ Decryption
19:
     M S - K e y P r i B S × R
20:
     m r m a p ( M )
21:
    return m
22:
end procedure
Theorem 1. 
(Homomorphic property) Algorithm A1 has an additive homomorphic property, namely the summation arithmetic in plaintext is equivalent to summation arithmetic in cipher text.
H E n c ( m 1 + m 2 ) = H E n c ( m 1 ) H E n c ( m 2 )
Proof
H E n c ( m 1 ) H E n c ( m 2 ) = R 1 , S 1 + R 2 , S 2 = R 1 + R 2 , S 1 + S 2 = ( R s u m , S s u m )
M s u m = S s u m - K e y P r i B S × R s u m = ( S 1 + S 2 ) - K e y P r i B S × ( R 1 + R 2 ) = S 1 - K e y P r i B S × R 1 + S 2 - K e y P r i B S × R 2 = M 1 + M 2
H D e c ( H E n c ( m 1 ) H E n c ( m 2 ) ) = r m a p ( M s u m ) = r m a p ( M 1 + M 2 ) = r m a p ( M 1 ) + r m a p ( M 2 ) = m 1 + m 2
H E n c ( m 1 + m 2 ) = H E n c ( H D e c ( H E n c ( m 1 ) H E n c ( m 2 ) ) ) = H E n c ( m 1 ) H E n c ( m 2 )
End of proof.

Appendix B. Identity-Based Signature Scheme

The identity-based signature (IBS) scheme is based on IBE. The users’ public identity information (ID, email, etc.) is used as the public key for signature verification in the IBS scheme, which can effectively solve the management problems in PKI public-key certificates.
The algorithm consists of four parts of setup, extract, signature and verify. BS using the master key MSKfor each node generates a signature private key based on their ID, the node using the private key for generating data of signature and data-receiving nodes using the sending node ID as the public key for the signature verification to verify the integrity of the information received.

Appendix C. Theorem 2 and Its Proof

Theorem 2. 
The homomorphic encryption used in the proposed scheme can ensure the vector obtained at the server after decryption is the aggregated result of all encoded data of the active nodes.
Algorithm 2: Identity-based signature scheme.
1:
procedureSetup               ▹ Initialization
2:
     < E , p , G > is the same as in Algorithm A1
3:
     m s k R F p
4:
     P m s k × G
5:
     H 1 , H 2 are hash functions
6:
    return p a r a = < E , p , G , P , H 1 , H 2 >
7:
end procedure
8:
procedure Extract( I D k )          ▹ Key Generation
9:
     R r × G , w h e r e r R F p
10:
     v r + H 1 ( I D k , R ) × m s k
11:
    return K e y S i g n k = < R , v >
12:
end procedure
13:
procedure Signature( m s g , K e y S i g n k )      ▹ Signature
14:
     T t × G , w h e r e t R F p
15:
     z t + H 2 ( I D k , m s g , R , T ) × v
16:
    return S k = < R , T , z >
17:
end procedure
18:
procedureVerify( S k , m s g , I D k )          ▹ Verification
19:
     h 1 H 1 ( I D k , R )
20:
     h 2 H 2 ( I D k , m s g , R , T )
21:
     A z × v
22:
     B T + h 2 × ( R + h 1 × P )
23:
    return A = = B
24:
end procedure
Proof
Sensor data are encoded into v k at the H-sensor and then are encrypted into c k using the public key K e y P u b B S .
Aggregation at the intermediate node directly performs on the ciphertext c k ; the final result obtained by the server is k = 1 N c k .
k = 1 N c k = k = 1 N ( H E n c ( v k ) )
According to Theorem 1,
k = 1 N ( H E n c ( v k ) ) H E n c ( k = 1 N v k )
so,
k = 1 N c k = H E n c ( k = 1 N v k )
The final aggregation result V can be obtained from k = 1 N c k by decryption,
V = H D e c ( k = 1 N c k ) = k = 1 N v k
End of proof.

Appendix D. Detail for the Example in Section 4.3.

The final aggregation result for example in Section 4.3 is V = v k = ( 2 1 2 1 1 ) . We can obtain each statistic from the final aggregated vector using the formulas provided in Section 4.2.
C n t = i = 1 L n i = 2 + 1 + 2 + 1 + 1 = 7
S u m ( x ^ ) = i = 1 L n i f m - 1 ( f c - 1 ( i ) ) = 2 × ( 4 × 1 - 2 10 + 9 ) + 1 × ( 4 × 2 - 2 10 + 9 ) + 2 × ( 4 × 3 - 2 10 + 9 ) + 1 × ( 4 × 4 - 2 10 + 9 ) + 1 × ( 4 × 5 - 2 10 + 9 ) = 69 . 2
S u m ( x ^ 2 ) = i = 1 L n i ( f m - 1 ( f c - 1 ( i ) ) ) 2 = 2 × ( 4 × 1 - 2 10 + 9 ) 2 + 1 × ( 4 × 2 - 2 10 + 9 ) 2 + 2 × ( 4 × 3 - 2 10 + 9 ) 2 + 1 × ( 4 × 4 - 2 10 + 9 ) 2 + 1 × ( 4 × 5 - 2 10 + 9 ) 2 = 686 . 2400
A v g = S u m ( x ^ ) / C n t = 9 . 8857
E ( x ^ ) = A v g = 9 . 8857
E ( x ^ 2 ) = S u m ( x ^ 2 ) / C n t = 98 . 0343
V a r ( x ^ ) = E ( x ^ 2 ) - E ( x ^ ) 2 = 0 . 3072
S T D = V A R ( x ^ ) = 0 . 3072 = 0 . 5543
M a x = f m - 1 ( f c - 1 ( i max ) ) = 4 × 5 - 2 10 + 9 = 10 . 8
M i n = f m - 1 ( f c - 1 ( i min ) ) = 4 × 1 - 2 10 + 9 = 9 . 2

References

  1. Wang, Y.C. Mobile sensor networks: system hardware and dispatch software. ACM Comput. Surv. 2014, 47, 12. [Google Scholar] [CrossRef]
  2. Chung, K.Y.; Yoo, J.; Kim, K. Recent trends on mobile computing and future networks. Pers. Ubiquitous Comput. 2014, 18, 489–491. [Google Scholar] [CrossRef]
  3. Pejovic, V.; Musolesi, M. Anticipatory Mobile Computing: A Survey of the State of the Art and Research Challenges. ACM Comput. Surv. 2015, 47, 1–29. [Google Scholar] [CrossRef]
  4. Ren, J.; Zhang, Y.; Zhang, K.; Shen, X. Adaptive and channel-aware detection of selective forwarding attacks in wireless sensor networks. IEEE Trans. Wirel. Commun. 2016. [Google Scholar] [CrossRef]
  5. Jose, J.; Princy, M.; Jose, J. EPSDA: Energy Efficient Privacy preserving Secure Data Aggregation for Wireless Sensor Networks. Int. J. Secur. Its Appl. 2013, 7, 299–316. [Google Scholar]
  6. Yang, G.; Li, S.; Xu, X.; Dai, H.; Yang, Z. Precision-enhanced and encryption-mixed privacy-preserving data aggregation in wireless sensor networks. Int. J. Distrib. Sens. Netw. 2013, 2013, 427275. [Google Scholar] [CrossRef]
  7. Nath, S.; Gibbons, P.B.; Seshan, S.; Anderson, Z. Synopsis diffusion for robust aggregation in sensor networks. ACM Trans. Sens. Netw. 2008, 4, 7. [Google Scholar] [CrossRef]
  8. Considine, J.; Hadjieleftheriou, M.; Li, F.; Byers, J.; Kollios, G. Robust approximate aggregation in sensor data management systems. ACM Trans. Database Syst. 2009, 34, 6. [Google Scholar] [CrossRef]
  9. Roy, S.; Conti, M.; Setia, S.; Jajodia, S. Secure Data Aggregation in Wireless Sensor Networks. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1040–1052. [Google Scholar] [CrossRef]
  10. Castelluccia, C.; Mykletun, E.; Tsudik, G. Efficient aggregation of encrypted data in wireless sensor networks. In Proceedings of the The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, San Diego, CA, USA, 17–21 July 2005; pp. 109–117.
  11. Castelluccia, C.; Chan, A.C.F.; Mykletun, E.; Tsudik, G. Efficient and provably secure aggregation of encrypted data in wireless sensor networks. ACM Trans. Sens. Netw. 2009, 5, 1–36. [Google Scholar] [CrossRef]
  12. Chen, C.-M.; Lin, Y.-H.; Lin, Y.-C.; Sun, H.-M. RCDA: Recoverable Concealed Data Aggregation for Data Integrity in Wireless Sensor Networks. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 727–734. [Google Scholar] [CrossRef]
  13. Jose, J.; Manoj Kumar, S.; Jose, J. Energy efficient recoverable concealed data aggregation in wireless sensor networks. In Proceedings of the 2013 International Conference on Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN), Tirunelveli, India, 25–26 March 2013; pp. 322–329.
  14. Lin, Y.H.; Chang, S.Y.; Sun, H.M. CDAMA: Concealed Data Aggregation Scheme for Multiple Applications in Wireless Sensor Networks. IEEE Transa. Knowl. Data Eng. 2013, 25, 1471–1483. [Google Scholar] [CrossRef]
  15. Stoica, I.; Morris, R.; Karger, D.; Kaashoek, M.F.; Balakrishnan, H. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, San Diego, CA, USA, 27–31 August 2001; pp. 149–160.
  16. Hsiao, H.C.; Chang, C.W. A Symmetric Load Balancing Algorithm with Performance Guarantees for Distributed Hash Tables. IEEE Trans. Comput. 2013, 62, 662–675. [Google Scholar] [CrossRef]
  17. Ganesh, A.J.; Kermarrec, A.M.; Massouli, E.L. Peer-to-peer membership management for gossip-based protocols. IEEE Trans. Comput. 2003, 52, 139–149. [Google Scholar] [CrossRef]
  18. Wuhib, F.; Stadler, R.; Spreitzer, M. A Gossip Protocol for Dynamic Resource Management in Large Cloud Environments. IEEE Trans. Netw. Serv. Manag. 2012, 9, 213–225. [Google Scholar] [CrossRef]
  19. Erkin, Z.; Veugen, T.; Toft, T.; Lagendijk, R.L. Generating Private Recommendations Efficiently Using Homomorphic Encryption and Data Packing. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1053–1066. [Google Scholar] [CrossRef]
  20. Apostolopoulos, J.; Wong, T.; Tan, W.; Wee, S. On multiple description streaming with content delivery networks. In Proceedings of the IEEE INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, New York, NY, USA, 23–27 June 2002; Volume 3, pp. 1736–1745.
  21. Ren, J.; Zhang, Y.; Zhang, N.; Zhang, D.; Shen, X. Dynamic channel access to improve energy efficiency in cognitive radio sensor networks. IEEE Trans. Wirel. Commun. 2016. [Google Scholar] [CrossRef]
  22. Ozdemir, S.; Xiao, Y. Secure data aggregation in wireless sensor networks: A comprehensive overview. Comput. Net. 2009, 53, 2022–2037. [Google Scholar] [CrossRef]
  23. Rivest, R.L.; Adleman, L.; Dertouzos, M.L. On data banks and privacy homomorphisms. Found. Secur. Comput. 1978, 4, 169–180. [Google Scholar]
  24. Acharya, M.; Girao, J.; Westhoff, D. Secure comparison of encrypted data in wireless sensor networks. In Proceedings of the Third International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt’05), Trentino, Italy, 3–7 April 2005; pp. 47–53.
  25. Agrawal, R.; Kiernan, J.; Srikant, R.; Xu, Y. Order preserving encryption for numeric data. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004; pp. 563–574.
  26. Ertaul, L.; Kedlaya, V. Computing Aggregation Function Minimum/Maximum using Homomorphic Encryption Schemes in Wireless Sensor Networks. In Proceedings of the 2007 International Conference on Wireless Networks, Las Vegas, NV, USA, 25–28 June 2007; pp. 186–192.
  27. Samanthula, B.K.; Jiang, W.; Madria, S. A Probabilistic Encryption Based MIN/MAX Computation in Wireless Sensor Networks. In Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management, Milan, Italy, 3–6 June 2013; pp. 77–86.
  28. Chen, L.; Lu, R.; Cao, Z.; Alharbi, K.; Lin, X. MuDA: Multifunctional data aggregation in privacy-preserving smart grid communications. Peer-to-Peer Netw. Appl. 2015, 8, 777–792. [Google Scholar] [CrossRef]
  29. Li, J.; Cheng, S. (ϵ, δ)-Approximate Aggregation Algorithms in Dynamic Sensor Networks. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 385–396. [Google Scholar]
  30. He, W.; Liu, X.; Nguyen, H.; Nahrstedt, K.; Abdelzaher, T. Pda: Privacy-preserving data aggregation in wireless sensor networks. In Proceedings of the IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, Anchorage, AK, USA, 6–12 May 2007.
  31. Shim, K.A.; Park, C.M. A Secure Data Aggregation Scheme based on Appropriate Cryptographic Primitives in Heterogeneous Wireless Sensor Networks. IEEE Trans. Parallel Distrib. Syst. 2014, 26, 2128–2139. [Google Scholar] [CrossRef]
  32. Mykletun, E.; Girao, J.; Westhoff, D. Public key based cryptoschemes for data concealment in wireless sensor networks. In Proceedings of the 2006 IEEE International Conference on Communications, Istanbul, Turkey, 11–15 June 2006; Volume 5, pp. 2288–2295.
  33. Bellare, M.; Namprempre, C.; Neven, G. Security proofs for identity-based identification and signature schemes. J. Cryptol. 2009, 22, 1–61. [Google Scholar] [CrossRef]
  34. Fang, X.; Gao, H.; Li, J.; Li, Y. Approximate Multiple Count in Wireless Sensor Networks. In Proceedings of the IEEE Conference on Computer Communications IEEE INFOCOM 2014, Toronto, ON, Canada, 27 April–2 May 2014; pp. 1474–1482.
  35. Cheng, S.; Li, J.; Ren, Q.; Yu, L. Bernoulli Sampling Based (ϵ, δ)-Approximate Aggregation in Large-Scale Sensor Networks. In Proceedings of the 2010 Proceedings IEEE INFOCOM’10, San Diego, CA, USA, 14–19 March 2010; pp. 1–9.
  36. Cheng, S.; Li, J. Sampling based (ϵ, δ)-approximate aggregation algorithm in sensor networks. In Proceedings of the 29th IEEE International Conference on Distributed Computing Systems, ICDCS ’09, Montreal, QC, Canada, 22-26 June 2009; pp. 273–280.
  37. NOAA. Tropical Atmosphere Ocean (TAO) Project. Available online: http://www.pmel.noaa.gov/tao (accessed on 21 April 2016.).
Figure 1. Network model.
Figure 1. Network model.
Sensors 16 00583 g001
Figure 2. Mapping and encoding step of MFSDA-I.
Figure 2. Mapping and encoding step of MFSDA-I.
Sensors 16 00583 g002
Figure 3. Communication cost comparison of RCDAand MFSDA.
Figure 3. Communication cost comparison of RCDAand MFSDA.
Sensors 16 00583 g003
Figure 4. Communication cost comparison of MFSDA-I and MFSDA-II.
Figure 4. Communication cost comparison of MFSDA-I and MFSDA-II.
Sensors 16 00583 g004
Figure 5. Accuracy evaluation of the MFSDA variant. (a) Error rate of MEAN; (b) error rate of VAR; (c) error rate of STD; (d) error rate of MAX; (e) error rate of MEDIAN; (f) error rate of MIN.
Figure 5. Accuracy evaluation of the MFSDA variant. (a) Error rate of MEAN; (b) error rate of VAR; (c) error rate of STD; (d) error rate of MAX; (e) error rate of MEDIAN; (f) error rate of MIN.
Sensors 16 00583 g005
Table 1. Comparison of secure data aggregation (SDA) schemes on functionality. Y, yes; N, no; P, partly; MFSDA, multi-functional secure data aggregation.
Table 1. Comparison of secure data aggregation (SDA) schemes on functionality. Y, yes; N, no; P, partly; MFSDA, multi-functional secure data aggregation.
Summation-Based StatisticsComparison-Based StatisticsSingle Query
Castelluccia et al. [11]YNN
Roy et al. [9]YNN
Lin et al. [14]YNN
Li et al. [29]YNN
He et al. [30]YNN
Yang et al. [6]YNN
Lu et al. [28]YNN
Acharya et al. [24]NPN
Ertaul et al. [26]NPN
Samanthula et al. [27]NPN
RCDA [12]YYY
MFSDAYYY
Table 2. Comparison of SDA schemes on adaptive dynamic networks.
Table 2. Comparison of SDA schemes on adaptive dynamic networks.
Not SupportSupport, with Extra CostSupport, without Extra Cost
Castelluccia et al. [11]NYN
Roy et al. [9]NYN
Lin et al. [14]NYN
Li et al. [29]NYN
He et al. [30]YNN
Yang et al. [6]YNN
Acharya et al. [24]YNN
Ertaul et al. [26]YNN
Samanthula et al. [27]YNN
RCDA [12]NNY
MFSDANNY
Table 3. Basic notation.
Table 3. Basic notation.
SymbolDescriptionSymbolDescription
x k ; X L B ; X U B Sensing data x k ( X L B , X U B ] aAccuracy requirement of x k
cCompressing factorL L = X U B - X L B a
y k Mapping data, y k ( 0 , L ] NNetwork size
z k Compression data of y k f m ; f m - 1 Mapping function and its inverse
v k ; v k Encoding data of y k or z k f c ; f c - 1 Compression function and its inverse
V k ; V k Aggregation of encoding data f e ; f e - 1 Encoding function and its inverse
C H cluster head C M cluster member
Table 4. Details of the values at each step in the example.
Table 4. Details of the values at each step in the example.
Sensor IDSensing DataMapping DataEncoded Data
1233(0 0 1 0 0)
2255(0 0 0 0 1)
316rejectreject
4233(0 0 1 0 0)
548rejectreject
6222(0 1 0 0 0)
7233(0 0 1 0 0)
8244(0 0 0 1 0)
Table 5. Comparison of SDA schemes on security.
Table 5. Comparison of SDA schemes on security.
A1A2A3A4B1B2
Castelluccia et al. [11]YNNNYN
Roy et al. [9]NNYNYN
Lin et al. [14]YNNNYN
Li et al. [29]NNNNYN
He et al. [30]YNNNNN
Yang et al. [6]YNNNNN
Acharya et al. [24]YNNNYN
Ertaul et al. [26]YNNNYN
Samanthula et al. [27]YNNNYN
RCDA [12]YNENYN
MFSDAYYYYYN
Table 6. Data description of the real dataset.
Table 6. Data description of the real dataset.
Data NameWind Direction
Data sourceTAO (Tropical Atmosphere Ocean) [37] project of
NOAA (National Oceanic and Atmospheric Administration)
Data range[0,359]
Accuracy5
L72
Table 7. Data length comparison between MFSDA-I and RCDA based on W i n d D i r e c t i o n of the TAO project (units: bits).
Table 7. Data length comparison between MFSDA-I and RCDA based on W i n d D i r e c t i o n of the TAO project (units: bits).
N = 50N = 100N = 150N = 200N = 250N = 300N = 350N = 400
RCDA350700105014001750210024502800
MFSDA-I432504576576576648648648
Table 8. Comparison between MFSDA-I and RCDA based on W i n d D i r e c t i o n of TAO Project (units: bits).
Table 8. Comparison between MFSDA-I and RCDA based on W i n d D i r e c t i o n of TAO Project (units: bits).
MFSDA-IMFSDA-II
Data LengthData LengthAccuracy
MEANVARSTDMAXMINMEDIAN
c = 2 792396 - 2 % 0 . 2 % ± 0 . 5 % ± 0 . 2 % ± 1 % ± 1 % ± 1 %
c = 3 792264 - 2 % 0 . 2 % ± 0 . 5 % ± 0 . 2 % ± 2 % ± 2 % ± 2 %
c = 4 792198 - 2 % 0 . 2 % ± 1 % ± 0 . 5 % ± 3 % ± 3 % ± 3 %
c = 5 792158 - 2 % 0 . 2 % ± 1 % ± 0 . 5 % ± 3 % ± 3 % ± 3 %

Share and Cite

MDPI and ACS Style

Guo, K.; Zhang, P.; Ma, J. Secure and Cost-Effective Distributed Aggregation for Mobile Sensor Networks. Sensors 2016, 16, 583. https://doi.org/10.3390/s16040583

AMA Style

Guo K, Zhang P, Ma J. Secure and Cost-Effective Distributed Aggregation for Mobile Sensor Networks. Sensors. 2016; 16(4):583. https://doi.org/10.3390/s16040583

Chicago/Turabian Style

Guo, Kehua, Ping Zhang, and Jianhua Ma. 2016. "Secure and Cost-Effective Distributed Aggregation for Mobile Sensor Networks" Sensors 16, no. 4: 583. https://doi.org/10.3390/s16040583

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop