1. Introduction
Smart meters are widely applied in Europe. Member states have committed to rolling out close to 200 million smart meters for electricity and 45 million for gas by 2020 [
1], and more than 200 million European households will have smart meters in 2023 [
2]. According to the European Parliament and the European Council, “Member States are required to ensure the implementation of smart metering systems that assist the active participation of consumers in the electricity supply and gas supply markets” [
3].
Smart meters can report instant electricity consumption to servers periodically, making fine-grained energy supply possible. However, these instantly reported data also bring some potential privacy risks. By using advanced power signature analysis tools such as nonintrusive appliance load monitoring (NIALM), an attacker can find out which appliances are working at any time [
4], and thus can learn more detailed information about a customer’s daily activities. According to Barbosa et al. (2015) [
5], “Fine-grained data of electricity usage naturally include personal and privacy-sensitive information regarding which appliances are active.” For example, the adversary can tell if there are people in the house or not, when the inhabitants wake up, take a shower, turn off the television, or even if some individual appliances are operating at a desired level of efficiency. There is a great need to protect this kind of personal information from being disclosed. Thus smart meter aggregation schemes have been proposed to protect people’s privacy.
Recently, Fan et al. (2014) proposed a smart meter aggregation scheme based on the bilinear map and computationally hard problems of group theory [
6]. He et al. (2017) improved the scheme of [
6] by importing the homomorphic encryption algorithm [
7]. Both of these schemes were claimed to be secure. However, we found that although both schemes can protect a user’s personal data from being leaked, they both have scalability problems. Once the system is deployed, it is hard to add a new smart meter to the system, and when one smart meter in the system is broken, the whole system cannot work correctly. In addition, replacing a broken smart meter with a new one is difficult. Moreover, both schemes have higher accuracy requirements for time, which means that all the smart meters in the system have to keep exactly the same time; even a one millisecond error will lead to an incorrect result. We will discuss these problems in
Section 3.
To solve these problems, a privacy-preserving data aggregation scheme for the smart grid is proposed, which enables smart meters to report their consumption periodically and at the same time prevents private information from being leaked. The proposed scheme is partly based on the homomorphic encryption algorithm. Our contributions are mainly reflected in two aspects:
First, the noise addition method is used to prevent an adversary from obtaining a smart meter’s consumption, and the efficiency of the proposed scheme is improved by using this method. We also analyzed different ways of generating noise.
Second, the proposed scheme overcomes the problems in related works, such as the scalability problem, and does not have a high accuracy requirement for time.
This study focuses on the security and privacy part of work done under the e-GOTHAM project; the previous work has been published [
8]. The paper is organized as follows: Related works are discussed in
Section 2. In
Section 3, we discuss the problems of the two related works. The proposed scheme is introduced in
Section 4. Security analysis is described in
Section 5. A comparison with the related schemes is in
Section 6. We conclude the paper in
Section 7.
2. Related Work
Smart grid privacy and security problems have drawn much attention. There are many ways to protect the privacy of a smart meter when it reports its consumption to the aggregator; for example, homomorphic encryption methods, rechargeable battery methods, noise addition methods, and trusted third party methods.
Noise addition is a promising and efficient way to protect the consumption privacy of a smart meter. Bohli first used this approach [
9], and Barbosa et al. (2015) [
5] and Wang et al. (2013) [
10] analyzed the privacy and utility metric of this problem, both proposing a metric for utility preservation. Wang et al. (2013) masked the data using Gaussian mixture models (GMMs) [
10]. Their experimental results show that the accuracy of recovering total electricity consumption can approximate 99%, while the ability to identify an individual’s usage pattern is substantially obviated. He et al. (2013) proposed masking the data by adding Gaussian noise [
11]. Random noise is purposely introduced to distort the smart meter’s consumption so that it is infeasible for an adversary to recover the real consumption. The random noise is chosen according to the power consumption data and other prior knowledge. Jordi and Josep analyzed the optimality of data-independent random noise distributions to achieve ε-differential privacy [
12]. They also analyzed the situations for single univariate query and multiple queries. Noise addition methods can significantly reduce the computation and communication costs of smart meters. “Since to preserve privacy the proposed approach just generates a random number, we claim that the proposed approach is lightweight” [
5]. However, the lack of authentication between the smart meter and the aggregator makes it possible for an adversary to easily launch an attack.
Some schemes require a trusted third party; we call this the trusted third party model, in which a trusted third party is introduced. He et al. (2017) built their scheme based on elliptic curve cryptography (ECC) [
13]. Fan et al. (2014) proposed a scheme based on the bilinear map and computationally hard problems in group theory [
6]. He et al. (2017) improved the computation efficiency of the scheme of Fan et al. [
7], and their scheme reduced the computation cost.
García and Jacobs [
14] were the first to try to apply additive homomorphic encryption to privacy-friendly smart metering architecture. In their architecture, each reporting period requires the transmission of O (n
2) ciphertexts. Lu et al. (2016) proposed an efficient and privacy-preserving aggregation scheme for secure smart grid communication [
15]. Their scheme realized a multidimensional data aggregation approach based on the homomorphic Paillier cryptosystem, which satisfies the real-time high-frequency data collection requirements of smart grid communication. Busom et al. [
16] built their scheme on the homomorphic encryption method, too. By homomorphically adding all n consumption, the existing link between customers and their consumption values is broken. In this way, detailed information can be sent without leaking individual personal data. Their approach does not require a trusted third party (except a certification authority) or communication among smart meters; the communication complexity is linear O (n). Dimitriou and Awad presented two decentralized privacy-respecting aggregating protocols for smart meters [
17]. Their first protocol focuses on honest-but-curious adversaries by using symmetric cryptography primitives. Their second one protects against more aggressive adversaries that not only try to infer individual measurements, but also disrupt protocol execution, which is based on public cryptography primitives.
Besides these ways of protecting the privacy of smart meter consumption, authentication between the smart meter and the aggregator is another factor that should receive attention when thinking about privacy protection. Elliptic curve [
18,
19,
20,
21,
22,
23,
24,
25] and bilinear map pairing [
26,
27,
28,
29,
30] are two of the most commonly used encryption methods for authentication schemes. Generally speaking, the bilinear map requires more computation cost than the elliptic curve method, and the elliptic curve method is more efficient.
Ping et al. proposed an elliptic curve cryptography–based authentication scheme with identity protection for smart grids [
23]. Adversaries are unable to obtain the real identities because the identities of the smart appliances and substations are encrypted before they are transmitted. Saxena and Choi proposed another authentication protocol for smart grid communication, also based on the elliptic curve. The hierarchy of their scheme is also three-layer [
24]. The scheme of Nicanfar and Leung is a multilayer consensus password authenticated key-exchange scheme for the smart grid [
25]. Saxena et al. proposed an authentication and authorization scheme for the smart grid; the protocol is based on bilinear map pairing [
28]. A bilinear pairing cryptography–based shared secret key is generated between the user and the device, and the key enables the two to communicate securely. Odelu et al. proposed a secure key agreement scheme for the smart grid; they built their scheme on bilinear map pairing [
29]. Jo et al. proposed privacy-preserving protocols for the smart grid using the distributed verification method; their encryption scheme is based on bilinear map pairing [
30].
3. Problems in the Trusted Third Party Model
In a trusted third party model, three types of entities are in the system: smart meters, an aggregator, and a trusted third party.
Figure 1 depicts the system structure.
In this system, during the system initialization phase, the trusted third party will generate a series of random numbers
and make sure
; these numbers are called blind factors. The blind factor
is sent to the aggregator, and
are sent to the
smart meter. At the aggregation phase, smart meter
sends
to the aggregator;
is the meter’s consumption data. The aggregator can recover the total consumption
using
.
In this way, the aggregator can get the total consumption of all the smart meters. However, it is unable to get the consumption of a single smart meter.
3.1. Scalability Problem
One of the drawbacks of the trusted third party model is the scalability problem. After deploying the system, it is difficult to add a new smart meter. If we want to add a smart meter to the system, we need to assign it a new blind factor, . However, it is not enough to just assign a new to the smart meter. We have to update for the aggregator, otherwise the aggregator is unable to recover the total consumption of the smart meters using the old ; has to be updated to .
However, if is sent to the aggregator, it can get the blind factor by computing . If the aggregator knows the blind factor , it can get the original consumption of smart meter . One potential solution is to run the system initialization phase again and let the trusted third party assign new blind factors for all smart meters and aggregators; however, it will be a daunting task once the smart meters have been deployed.
Another problem is that the system will fail to work when a smart meter is broken. Suppose
is broken and it cannot send
to the aggregator, then the aggregator is unable to get the total consumption of all the smart meters; what the aggregator gets is
, in which
. The following is an analysis based on the reported data in the research of Fan et al. [
6]:
What is worse, it is also difficult to replace a broken smart meter with a new one. If we want to replace the meter, we will encounter the problem of adding a meter to the system. As we have discussed, adding smart meters to the system is difficult.
3.2. Precise Time Requirement
The other problem in the trusted third party model is that it has a high accuracy requirement for time, which means that all of the smart meters have to synchronize their time precisely, because it is a prerequisite of this model that the time of different smart meters must be identical, otherwise the aggregator is unable to recover the original consumption data. The problem becomes worse in Fan’s scheme [
6], where the aggregator has to synchronize its time with all the smart meters, and even a one millisecond error will lead to a wrong answer.
3.3. Comparison
Finally, we get
Table 1, a comparison of the trusted third party model and the proposed scheme. It is clearly shown in the table that the proposed scheme overcomes the problems of the trusted third party model.
4. Proposed Scheme
The model of the proposed scheme is depicted in
Figure 2; there are two types of entities in the system, smart meter and aggregator. All the smart meters in the system have to register at the aggregator first; after registration, they can report their consumption data to the aggregator periodically. The aggregator will only accept the reporting data of the registered smart meters.
To protect the privacy of the users, in every reporting cycle, a smart meter generates a random noise to perturb its consumption, and will send to the aggregator. In this way, the aggregator is unable to get the because it does not know the .
Since the noises are generated following a normal distribution, if we set the average value of the random numbers to be 0, we know
,
is the number of smart meters in an aggregation system, thus the aggregator can get the total consumption:
We should note here that when become larger, will gradually approach 0, and will not become larger even when becomes larger.
For example, if we set the tolerable error
to be within the range of [–5, 5] kWh, the probability that
falls into [–5, 5] kWh is set to be:
. If we set
100, we can get
. That is, if the noise generated obeys the normal distribution with average
, and
, then the sum of
will be within the range of [–5, 5] kWh with a probability of 98%. The noise generated follows other distributions, too, and the results are listed in
Table 2 [
5].
To find out which distribution model is the best, we use
Table 3, which is the distributions of noises when
,
.
The noise obeys the normal distribution or the Laplace distribution aggregated too closely around the average value , which means a large amount of the noise is too small. For noise that obeys the Laplace distribution, 58.09% of the noise is within [–0.01, 0.01], which means more than half of the noise is too small. The range of noises obeying the U-quadratic distribution is [–0.0385, 0.0385] and the range of noises obeying the arcsin distribution is [–0.0462, 0.0462], and both are smaller than the much larger range of noises obeying the uniform distribution, [–0.0693, 0.0693].
Now we can conclude that noises obeying uniform distribution are the best. On the one hand, they are equally distributed within the range; on the other hand, the range of noises obeying uniform distribution is larger.
4.1. Notions Used in the Schemes
The proposed scheme is based on the Boneh–Goh–Nissim homomorphic encryption scheme [
31]; Boneh et al. (2005) proposed a probabilistic homomorphic encryption algorithm. The system resembles the Paillier [
32] and Okamoto–Uchiyama [
33] encryption schemes. This system is additively homomorphic. The proposed scheme consists of three phases, the system initialization phase, the smart meter registration phase, and the meter reporting phase. Some notions are given in
Table 4.
4.2. System Initialization
In this phase, the aggregator initializes and publicizes the parameters; this is a three-step process.
Step 1: For the elliptic curve parameters, the aggregator selects two random -bit primes , and sets , and generates a multiplicative group . Let be generators of , set and be a bilinear map.
Step 2: For the modular exponential group parameters, the aggregator randomly generates two large numbers ( is a 1024-bit prime number and is a 160-bit prime number) and picks a generator . In this study, a 1024-bit group with a 160-bit prime order subgroup is chosen.
Step 3: The aggregator publishes the system parameters , and the aggregator keeps its private key secret.
4.3. Smart Meter Registration Phase
The smart meter registration process is depicted in
Table 5. In the registration phase, the smart meter generates a registration request and sends it to the aggregator. When the aggregator receives the request, it first checks the correctness of the message; if it is correct, the aggregator will store this message in it memory.
First, smart meter generates a private key , then computes the public key and a signature , where is the current timestamp. sends the registration request to the aggregator over a secure channel.
When aggregator receives , it checks whether . If they are equal, stores .
4.4. Reporting Phase
In the reporting phase, the smart meters extract their consumption data and send the encrypted data to the aggregator. When the aggregator receives the data, it will first authenticate and then decrypt the data using its private key. The reporting process is depicted in
Table 6.
At the beginning of a reporting cycle, each smart meter generates a noise to perturb its consumption . Then ( is encrypted by the homomorphic encryption algorithm. The process is as follows:
Meter extracts its consumption data , generates a random element , and picks an element .
Meter generates noise , which obeys the uniform distribution.
Meter computes .
Meter computes .
Meter gets the signature of and by computing ; is the current timestamp.
Meter computes .
Meter sends to the aggregator.
After receiving the reporting messages from all smart meters, the aggregator
first checks the correctness of the incoming messages, then gets the consumption of all the smart meters using Pollard’s lambda method, since the total consumption is not a large number in a regular interval [
34].
Aggregator gets .
Aggregator picks at random.
Aggregator gets .
Aggregator checks if .
If the upper test holds, aggregator gets the electricity consumption by computing , where .
The aggregator is able to get the consumption data of all the smart meters as
. The following shows the proof of the correctness of the proposed scheme. As
and
, we can get the following equations:
Then we can get
. Let
; to compute
, it will take
using Pollard’s lambda method ([
35], p. 128).
5. Security Analysis
In this section, we conduct a security analysis of the proposed scheme in terms of security against external and internal adversaries, and security of the signature scheme.
5.1. For External Adversaries
As the Boneh–Goh–Nissim homomorphic encryption algorithm is semantically secure, we can get Theorem 1.
Theorem 1. The proposed scheme achieves semantic security under the chosen cipher attack if and only if the Boneh–Goh–Nissim homomorphic encryption algorithm achieves semantic security.
(⇒) Suppose there is an efficient algorithm that could break the Boneh–Goh–Nissim homomorphic encryption algorithm in probabilistic polynomial time, which means for a real consumption pair and a cipher and public parameter , an adversary is able to judge if is the cipher of or with a probability that is higher than 1/2.
Given a cipher and public parameter , the adversary is able to get by using algorithm . If , then is the cipher of , and if , then is the cipher of . In both situations, the adversary is able to judge if is the cipher of or with a probability that is higher than 1/2. We can conclude that with algorithm , an adversary can break the semantic security of the proposed scheme with a probability that is higher than 1/2.
(⇐) Suppose there is an efficient algorithm that could break the proposed scheme in probabilistic polynomial time. Given a cipher and public parameter , adversary is able to judge if is the cipher of or a random number.
If is the cipher of , for the Boneh–Goh–Nissim homomorphic encryption algorithm, given , can get . This means is able to break the algorithm.
5.2. For Internal Adversaries
In the proposed scheme, the smart meter reports to the aggregator represent the real consumption and a noise is randomly generated by the smart meter. Only the smart meter knows , other entities in the system are unable to get , thus they are unable to get the original consumption . The privacy of a single smart meter is protected, only the smart meter knows the real consumption .
5.3. Security of the Signature Scheme
Now we are going to prove that the signature scheme in the proposed schemes is secure. The proof is based on the computational hardness of the discrete logarithm (DL) problem. The discrete logarithm problem for a group can be stated as:
Given a group with order , for and , find an integer such that .
Theorem 2. The signature scheme in the proposed scheme achieves semantic security under the chosen cipher attack if and only if the discrete logarithm problem is unable to be solved in polynomial time.
(⇒) Suppose there is an efficient algorithm
that could break the DL problem in probabilistic polynomial time. This means that for a message pair
and a signature
, given
and public parameter
included the public key
of
, an adversary
is able to get:
If , then is the signature of , and if , then is the signature of ; in both situations, the adversary is able to judge if is the signature of or with a probability that is higher than 1/2. We can get the conclusion that algorithm can break the semantic security of the signature scheme with a probability that is higher than 1/2.
(⇐) Suppose there is an efficient algorithm
that could break the signature scheme in the proposed scheme. Given a message
, a signature
,
and public parameter
included the public key
of
, adversary
is able to judge if
is the signature of
or a random number. If
is the signature of
, an adversary
is able to get:
This means that with the help of an algorithm , given and public parameter included the public key of , the adversary can get . As for the DL problem, suppose and . Given , can get . This means the adversary can break the semantic security of the DL problem.
5.4. Security Analysis Using AVISPA
We ran a security check using the constraint-logic -based model-checker [
36] and the on-the-fly model-checker (OFMC) [
37,
38] of Automated Validation of Internet Security Protocols and Applications (AVISPA). The simulation results shown in
Table 7 demonstrate that the proposed scheme is safe.
6. Comparison
In this section, we compare the computation times for each scheme. The experimental results of different kinds of operations are shown in
Table 8. We use the famous Java Pairing-Based Cryptography Library (JPBC) [
39]. Type A1 pairings are constructed on the curve
over the field
for some prime
, and this pairing is symmetric. The order of the group is some prime factor of
for the initiation of the curve, the number of primes is set to 2, and the bit length of each prime is set to 160. The parameters for the elliptic curve are listed in
Appendix A. The upper bound of Pollard’s lambda is set at 100,000.
We chose a 1024-bit modular exponential group with a 160-bit prime order subgroup. The detailed parameters can be found at RFC 5114 [
40], and we have listed the parameters in
Appendix B.
The experiment environment is a 64-bit Windows 7 Enterprise operating system with Intel(R) Core(TM) i73370K CPU 3.5 GHz processor and 8 GB memory. The code for testing the computation times of different operations has been uploaded to a public repository at github.com [
41]. The meanings of different symbols are given below.
bilinear map pairing operation
hash to an element operation
element multiplication operation
element exponentiation
Pollard’s lambda method
multiplication operation in
exponentiation operation in a modular group with an exponent of 60 bits
exponentiation operation in a modular group with an exponent of 60–160 bits
multiplication operation in a modular group
hash to a big integer operation
SHA256 operation
6.1. Computation Performance Analysis
We analyzed the computation cost of different schemes at the smart meter registration and aggregation phases. Suppose there are smart meters in an aggregation system.
For Fan’s scheme, in the registration phase, the smart meter has to conduct two and two operations; the aggregator has to conduct one , one , and two operations. In the aggregation phase, the smart meter has to conduct two , two , and four operations; the aggregator has to conduct one , (k + 1) , (2k − 1) , (2k + 2) , (k + 1) , and (k − 1) operations.
For He’s scheme, in the registration phase, the smart meter has to conduct two , one , and one operations; the aggregator has to conduct two , one , and one operations. In the aggregation phase, the smart meter has to conduct two , one , three , one , one and one operations; the aggregator has to conduct one , k , two , one , (k + 1) , k , (2k − 1) , and k operations.
For the proposed scheme, in the registration phase, the smart meter has to conduct one and one operation; the aggregator has to conduct one operation. In the aggregation phase, the smart meter has to conduct one , two , one , one , and one operations; the aggregator has to conduct one , (k − 1) , one , one , (122k + 120) , and k operations.
Table 9 shows the computation cost of the registration phase and
Table 10 shows the computation cost of the aggregation phase, in which
k stands for the number of smart meters in the aggregation system.
Table 11 shows the computation costs of different schemes in the registration phase in milliseconds. It is clearly shown in the table that the cost is minimal. This is because the proposed scheme only needs modular exponential group operations and the general SHA-256 operation. These two kinds of operations are both lightweight.
Figure 3 shows the computation costs of the smart meter side in the aggregation phase. The horizontal axis of this figure is the computation time, and the unit is a millisecond. It is clearly shown in the figure that the computation cost of the proposed scheme is minimal.
Figure 4 shows the computation cost of the aggregator side in the aggregation phase. The vertical axis of this figure indicates the computation time, and the unit is a second; the horizontal axis indicates the number of smart meters. It is clearly shown in the figure that the computation cost of the proposed scheme is minimal under all conditions.
6.2. Communication Performance Analysis
In this section, we show the communication cost of all the schemes. The lengths of are 1024 bits and 160 bits, respectively. The length of is 330 bits, the length of an element of is 660 bits, the order of the curve is a 320-bit-long number. The size of the timestamp is 32 bits, and the identity is set to be 64 bits long. We analyzed the communication cost of the registration and aggregation phases.
For Fan’s scheme, at the registration phase, the smart meter has to send to the aggregator, and the bit length of this message is 660 + 660 + 330 + 330 + 64 = 2044. In the aggregation phase, the smart meter has to send to the aggregator, and the bit length of this message is 64 + 660 + 660 = 1384.
For He’s scheme, at the registration phase, the smart meter has to send to the aggregator, and the bit length of this message is 64 + 1024 + 1024 + 160 = 2272. In the aggregation phase, the smart meter has to send to the aggregator, and the bit length of this message is 64 + 660 + 1024 + 160 + 32 = 1940.
For the proposed scheme, at the registration phase, the smart meter has to send to the aggregator, and the bit length of this message is 1024 + 32 + 256 + 64 = 1376. In the aggregation phase, the smart meter has to send to the aggregator, and the bit length of this message is 64 + 660 + 1024 + 160 + 32 = 1940.
The communication cost of different schemes is shown in
Table 12.
6.3. Comparison of All Features
In this section we compare the three schemes in different metrics, and the results are shown in
Table 13. As we discussed in
Section 3, the schemes of He et al. and Fan et al. have a meter failure problem: when one or more of the smart meters are broken, the scheme fails to work. If we want to add a new smart meter to the system, the whole system needs to be redeployed. Besides, it is a difficult task to replace a broken smart meter with a new one in the other two schemes; if a smart meter is broken, the whole system needs to be redeployed, too. The two schemes also require a higher time accuracy; this means that even if there is only a one millisecond mistake, the aggregator will not get the original data. Moreover, the computation cost of the proposed scheme is the least of the three under all conditions.