EPPRD: An Efficient Privacy-Preserving Power Requirement and Distribution Aggregation Scheme for a Smart Grid

A Smart Grid (SG) facilitates bidirectional demand-response communication between individual users and power providers with high computation and communication performance but also brings about the risk of leaking users’ private information. Therefore, improving the individual power requirement and distribution efficiency to ensure communication reliability while preserving user privacy is a new challenge for SG. Based on this issue, we propose an efficient and privacy-preserving power requirement and distribution aggregation scheme (EPPRD) based on a hierarchical communication architecture. In the proposed scheme, an efficient encryption and authentication mechanism is proposed for better fit to each individual demand-response situation. Through extensive analysis and experiment, we demonstrate how the EPPRD resists various security threats and preserves user privacy while satisfying the individual requirement in a semi-honest model; it involves less communication overhead and computation time than the existing competing schemes.

level, BANs at the building level, and NANs at the substation level. In this configuration, different gateways are responsible for aggregating data, namely, the HAN Gateway (GW), BAN Gateway (GW), and NAN Gateway (GW), which reside in each corresponding layer of the network. Based on this hierarchical architecture, they proposed an authentication scheme based on computational Diffie-Hellman encryption to maintain data integrity. We adopt a lightweight authentication method in combination with our hierarchical network architecture to satisfy the scalability and the real-time and efficient communication requirement while preserving privacy.
We propose an efficient privacy-preserving power requirement and distribution aggregation scheme for a Smart Grid (EPPRD). The scheme focuses on securing the communications required to implement individual power requirements and distribution suited to the current power level, in which a lightweight, scalable authentication protocol is proposed for bidirectional communication based on hierarchical communication networks. The main contributions of this paper are as follows: • It may be necessary to adjust a user's power distribution in the next time slot to flatten demand peaks based on the power consumption in the current time slot, because power changes dynamically over time. During peak demand, the Control Center (CC) reduces the total distribution to users to adjust power consumption from peak time to non-peak time in the next time slot. Therefore, our demand message is divided into two parts: an individual user requirement based on RSA encryption for the next time slot and the total user consumption based on Paillier encryption in the current time slot, which is one significant reference of power distribution at the next time slot for the CC.

•
To reduce the volume of transmitted traffic, we locate a regional concentrator in the BAN for regional storage, aggregation, transmission, and distribution. After the BAN receives the distributed regional power ratio from the CC, it immediately distributes individual power to the users according to the stored requirement and the distributed regional power ratio.

•
To ensure message confidentiality and integrity, we employ the Public-Private, Paillier homomorphic cryptography and Hash-based Message Authentication Code authentication in the HAN Smart Meter (HSM), BAN Gateway (BGW), and NAN Gateway (NGW). This scheme can resist various attacks, such as replay attacks, man-in-the-middle attacks, eavesdropping attacks, and so forth. This scheme offers stringent security and reliability guarantees.

•
The remainder of the paper is organized as follows. In Section 2, we introduce related work with EPPRD. In Section 3, we introduce an EPPRD communication model and security goal. In Section 4, we introduce the basic preliminaries such as Computational Diffie-Hellman (CDH) Problem and Paillier cryptosystem. In Section 5 we propose the EPPRD scheme and security analysis and proof. After that we present our performance analysis and discussion in Sections 6 and 7, respectively. Finally, we draw conclusions in Section 8.

Related Work
Although multiple studies in [8,19,[21][22][23][24][25][26][27] have already proposed methods to securely aggregate user measurements in SG, they have focused primarily on total user power aggregation rather than on individual information exchange between a user and a power utility, and such total aggregation is not suitable for modern individual demand-response characteristics. Some of the proposed methods are also vulnerable to various attacks because they lack a rigorous authentication process, and some are inefficient due to their high communication overhead.
In [19,22], the authors introduce a simple authentication scheme. Two parties (in HAN and BAN or NAN) establish a shared key using the Diffie-Hellman Technique, after the initial authentication, they generate HMAC signatures for all subsequent communications. However, these studies did not address the issue of privacy at all. A hierarchical communications architecture was also adopted in [21], which proposed an individual security billing scheme based on the hierarchical communications architecture. The user submits an encrypted power requirement to the aggregator. When billing, the user can show the CC the pre-submitted requirement and receives a reward or penalty. Although the scheme adopts a method similar to ours regarding the hierarchical communications architecture, HMAC authentication, and bi-directional communication, there are some differences between [21] and our study:

•
Our scheme focuses on preserving the privacy of individual power requirement and distribution instead of on individual power billing. We adopt two different encryption modes for individual power requirement and distribution, while [21] employs only Paillier homomorphic encryption for its power requirement. • Zhong et al. in [21] employ commitments to store an individual power requirement and transmits it upward through nodes to the CC, which generates excessive communication overhead, while we employ a regional concentrator to store and distribute the individual power requirement. • From a security and data integrity perspective, [21] employs only one authentication key throughout the entire authentication process; however, as is well known, a user's smart meter is more vulnerable to attack than a gateway is. Therefore, if the authentication key is compromised, all the subsequent authentication processes are vulnerable to a man-in-the-middle attack. Our scheme strengthens this aspect by adopting a stringent method of authentication between the HSM and the BAN Concentrator (BC) to reduce the vulnerability of the HSM. In our scheme, a new authentication key is generated randomly based on the Diffie-Hellman key establishment protocol in every communication session. In comparison with [21], we show that when the number of smart meters is very large, our protocol is more efficient and more stringent than competing schemes.
The study in [8] presents a secure privacy preserving aggregation method to protect the electricity consumption of an individual user. It can also resist internal attacks. However, it differs from ours in its encryption scheme, authentication, and trustable nodes. TTP is employed in [8], while we employ regional concentrators in the BAN layer.
Numerous authentication schemes have been proposed thus far [23][24][25]; however, all these schemes suffer from too many authentication steps, which cause high communication overhead and long delays. In this paper, these challenging issues are ameliorated [8] by proposing a robust, efficient, and lightweight message authentication scheme to ensure secure communications between the GWs. Our authentication scheme provides mutual authentication among smart meters located in different area networks in a hierarchical communication network. The proposed authentication scheme is based on the Diffie-Hellman key establishment protocol and keyed Hash-based Message Authentication Code (HMAC K) in [19].
Of course, there are quite a few studies that involve regional concentrators. Of these, [26,27] are closest to ours. The employed method in [26] provided a comprehensive performance analysis of the Split and Aggregated TCP (SA-TCP) scheme. It studies the impact of varying various parameters on the scheme, including the impacts of network link capacity and the buffering capacity of Regional Collectors (RCs), and it uses RCs as the SA-TCP aggregators. It is noted in [26] that RCs are trustable gateways that are installed at preselected locations in every region to route the meters' data packets through a wide area network to the utility server. The study in [27] compares the performance of four different WSN architectures in terms of energy consumption, in which the CN (Concentrator Node) in the third presented architecture is similar to the regional concentrator in our scheme. A Ttrustable regional concentrator should have some storage and processing capability to allow it to aggregate the periodically generated regional metering information and stores those values in its memory. Then, for example, at the end of the day, the concentrator could aggregate the information and send a summary message to the CC [27]. Using this approach, the traffic in the low-level network can be greatly reduced.
Based on this idea, we install also a trustable regional concentrator in the BAN layer, called a BAN Concentrator (BC). Each set of n meters establish n TCP connections with a BC, which is a gateway that

System Communication Model
The system communication model as shown in Figure 1 is based on the hierarchical communication architecture. In EPPRD, the communication network framework includes Neighborhood Area Network (NAN), Building Area Network (BAN), and Home Area Network (HAN). HAN, BAN, and NAN communicate through Wimax, and NAN connects the CC with optical fiber.
We use the HSM to represent HAN Gateway Smart Meter; the BC represents BAN Concentrator; NGW represents NAN Gateway; and the GWs stands for BC, and NGW below. • CC: we assume CC is a highly trusted and powerful entity in charge of managing the whole system. Its duty is to initialize the system and to collect, process, analyze the real-time data, and provide power distribution according to the power level and real-time data. • BC: we assume BC is a highly trusted gateway in charge of collecting, storing, aggregating, and distributing real-time data. BC can also store regional individual power requirements and aggregate regional power consumption and transmit it with regional requirement summation through the NAN to the CC and distribute individual power to every user according to the power ratio from the CC. BC needs enough secure storage, which can be used to handle the long-term keys described above and protect their private reading; this can be achieved, for example, by TPM chips to store the specific power requirements of HSM. • NGW: NGW is a power gateway, which connects real-time data from BC and CC. The duty of NGW is to relay and aggregate real-time data. The duty of aggregation is aggregate the regional consumption data from BC, whereas the duty of relay is to relay the regional requirement data from BC in a secure way. We use the HSM to represent HAN Gateway Smart Meter; the BC represents BAN Concentrator; NGW represents NAN Gateway; and the GWs stands for BC, and NGW below.
• CC: we assume CC is a highly trusted and powerful entity in charge of managing the whole system. Its duty is to initialize the system and to collect, process, analyze the real-time data, and provide power distribution according to the power level and real-time data. • BC: we assume BC is a highly trusted gateway in charge of collecting, storing, aggregating, and distributing real-time data. BC can also store regional individual power requirements and aggregate regional power consumption and transmit it with regional requirement summation through the NAN to the CC and distribute individual power to every user according to the power ratio from the CC. BC needs enough secure storage, which can be used to handle the long-term keys described above and protect their private reading; this can be achieved, for example, by TPM chips to store the specific power requirements of HSM. • NGW: NGW is a power gateway, which connects real-time data from BC and CC. The duty of NGW is to relay and aggregate real-time data. The duty of aggregation is aggregate the regional consumption data from BC, whereas the duty of relay is to relay the regional requirement data from BC in a secure way. • HSM: we refer to HSM as a user with a smart meter and the HAN is made up of various smart applications. The real-time data of HSM is collected and processed by BC and transmitted into CC via NGW. Although HSM is tamper-resistant and interfering with measurements is not trivial, it is not as powerful as the gateway (e.g., BC, NGW), so it may be vulnerable to attackers.
For the sake of simplicity, we assume every set of m HSMs establish m TCP connections with a BC, every set of n BCs establishes n TCP connections with a NGW, and every set of p NGWs establishes p TCP connections with CC.

Security Goals
We have the following three security goals:

•
Confidentiality. Authorized limitation to access data and encryption is critical to protect personal privacy and information-in other words, only the granted entity can receive the individual user data or access the databases of the GWs, i.e., an attacker cannot decrypt the communication flows between GWs and CC. • Data integrity, authentication, and access control. Authentication and access control verify authorized communication entity and ensure access to the power information, which prevent an ungranted attacker from modifying and destructing the power data integrity and availability. • Forward secrecy. Forward secrecy is a property of secure communication protocols in which compromise of long-term keys does not compromise past session keys.
To satisfy these secure goals, not only should every node be encrypted with cryptographic primitives but communication flows should be verified with an efficient and bidirectional authentication method.

Attack Model
We assume smart meters (e.g., HSM, NGW) are semi-honest (also known as "Honest but curious") that faithfully follow all prescribed protocols and provide real measurements; however, they attempt to know as much data as possible. Although HSM is assumed to be tamper-resistant, we do not rule out the possible of data pollution (or DoS) attack. A data pollution attack is a kind of malicious participant attack where the attacker lies about their values, resulting in incorrect measurement results. It is not within the scope of this paper, but we would like to mention that one possible solution is interactive or non-interactive zero knowledgeproof.

•
We consider the following possible attack types in EPPRD.

•
External Attack: The external attacker tries to infer the individual information by eavesdropping on the communication and data flow from the HSM to the BC, from the BC to the NGW, and from the NGW to the CC.

•
Internal Attack: Internal attackers are usually participants of the protocol (e.g., NGW) who may collude with as many compromised HSMs as possible to learn about the individual user's privacy, or a curious HSM who attempts to infer the private data of another HSM. • Man in the middle attack: The attacker forges or alters the communication data once he is authorized by any communication party, so the authentication key between HSM and BC should be different from that between BC and NGW to prevent the authenticated attacker from altering the communication data between BC and NGW. • Replay Attack: Attacker tries to repeat or delay a valid data transmission while misleading the honest sender into thinking they have successfully finished the data transmission.

Preliminaries
In this section, we briefly provide some preliminaries for the security and authentication scheme used in EPPRD.

Computational Diffie-Hellman (CDH) Problem
The CDH problem is stated as follows: Given the elements g a and g b , for unknown a, b ∈ Z * q , G = g be a group of large prime order q, it is hard to compute g ab ∈ G. Based on the CDH assumption, the lightweight message authentication scheme is described in detail in [19] and is not repeated here.

Paillier Cryptosystem
The Paillier Cryptosystem was proposed in 1999 by Pascal Paillier and is one common homomorphic encryption that is widely used in privacy-preserving applications [28]. Concretely, the Paillier Cryptosystem is comprised of three algorithms: key generation, encryption, and decryption.
Key Generation: Given the security parameter κ, two large prime numbers p, q are first chosen, Then the public key is PK = (N, g), then the corresponding private key is SK = (λ, α). Encryption: Given one message m ∈ Z N , a random r ∈ Z * N, the corresponding ciphertext can be calculated as c = E(m, r) = g m ·r N mod N 2 .
Semantic Security: With the additional properties of the Paillier cryptosystem, the attacker cannot distinguish the ciphertext of plaintexts even if the plaintexts are the same. The semantic security is proved under the decisional composite residuosity assumption: Given N = pq, it is hard to decide whether an element in Z N 2 is an N-th power of an element in Z * N 2 [18].

System Initialization
For the given hierarchical communication system model in Figure 1, the CC can bootstrap the whole system. We randomly select one HSM node, one BC node, and one NGW node and denote them as HSM i , BC j , and NGW k , respectively. We assume that the BC j has m HSM nodes, the NGW k has n BC nodes, and the CC has p NGW nodes. The specific notations in our scheme are listed in Table 1.
The special initialization process is as follows: • Given the security parameter κ, CC first generates (p,q) by running Gen(κ), and calculates the Paillier Cryptosystem's public key denoted, PK CC (n = pq, g) and the corresponding private key SK CC (λ, α), where p and q are two large prime numbers for which |p| = |q| = κ. The <CC, PK CC > is distributed to each node in the network model, and the SK CC is kept private; • For each user's smart meter, HSM i generates a pair of public and private keys PK HSM i and SK HSM i respectively. Then, <HSMID i , PK HSM i > is stored at the control center and distributed to each user after initialization, while SK HSM i is preloaded into the HSM i and kept private.

•
Each BC j generates a pair of public and private keys, PK BC j and SK BC j respectively. Then, <BCID j , PK BCj > is stored at the control center and distributed to each user after initialization, while SK BC j is preloaded into the BC j and kept private.
• Each NGW k generates a pair of public and private keys, PK NGW k and SK NGW k , respectively. Then <NGWID k , PK NGWk > is stored at the control center and distributed to each NGW after initialization, while SK NGW k is preloaded into the NGW k and kept private. • CC generates an authentication key, s, encrypts it with the BC's and the NGW's public ciphertext, and transmits it to the BC and NGW, respectively.

Upward Message Form
In our scheme, the CC collects one power requirement and consumption instruction per collection period , which include two parts: every user power requirement for the next time slot and the total power consumption for the last time slot. Respectively, these are the public RSA encryption part denoted as E P r , and the Paillier homomorphic encryption part denoted as E H u , as shown in Figure 2. • CC generates an authentication key, s, encrypts it with the BC's and the NGW's public ciphertext, and transmits it to the BC and NGW, respectively. Public key of the control center SK Private key of the control center HSM The ith HSM BC The jth BC NGW The kth NGW PK Public key of HSM SK Private key of HSM PK Public key of BC SK Private key of BC PK Public key of NGW SK Private key of NGW E Public encryption of the requirement for next time slot E Homomorphic encryption of a user's power consumption ENC (M) Encryption of plaintext M using key HMAC (M) HMAC of message M using key x

Upward Message Form
In our scheme, the CC collects one power requirement and consumption instruction per collection period , which include two parts: every user power requirement for the next time slot and the total power consumption for the last time slot. Respectively, these are the public RSA encryption part denoted as E , and the Paillier homomorphic encryption part denoted as E , as shown in Figure 2. We encrypt each individual power requirement with public RSA encryption E because the BC needs to store the encrypted individual requirement and decrypt it later to distribute power according to the power ratio at the power distribution phase.
In addition, individual power consumption requires summation to act as a reference for power distribution during the next time slot. For this, we employ homomorphic encryption E , which also prevents any intermediate nodes from leaking individual consumption.
HSM computes the individual upwardly transmitted messages, msg , as follows: We encrypt each individual power requirement with public RSA encryption E p r i because the BC needs to store the encrypted individual requirement and decrypt it later to distribute power according to the power ratio at the power distribution phase. In addition, individual power consumption requires summation to act as a reference for power distribution during the next time slot. For this, we employ homomorphic encryption E H u i , which also prevents any intermediate nodes from leaking individual consumption.
HSM i computes the individual upwardly transmitted messages, msg i , as follows: where E P r i represents the public encryption value of the requirement plaintext r i with PK BC j and E H u i represents the homomorphic encryption value of the consumption plaintext u i with PK CC . The header includes two parts: ID i denotes the sender ID and Len denotes the length of the public encryption part, which separates the non-homomorphic part from the homomorphic part.
As seen in Figure 2, we define every BC as both regional aggregator and distributor. They store encrypted individual power requirement, aggregate regional power consumption, and transmit it after regional requirement summation via NGW to the CC. They also distribute individual power to each user according to the power ratio from the CC.

Authentication Part
In the Related Work (Section 2), we mentioned that the authentication scheme in [21] is not sufficiently stringent because the only authentication key may be leaked. Therefore, we adopt an authentication protocol based on the Diffie-Hellman key-establishment protocol proposed in [19] between HSM i and BC j . The specific processes are depicted in Figure 3.
• HSM i HSM i selects a random number a, b ∈ Z * q from a positive integer in prime order. Let G = <g> be a group of prime numbers. Given g a , HSM i computes ENC BC j (i j t i g a ) (where t i is the current time slot) and transmits it to BC j .
After receiving ENC BC j (i j t i g a ), BC j first decrypts it with its private key, SK BC j , to verify the freshness of t i . Then, it sends an encrypted response consisting of g b ENC HSM i (i j t j g a g b ) to HSM i .
After receiving ENC HSM i (i j t j g a g b ) from BC j , HSM i first verifies the freshness of t j . Then, it recovers g a and g b using its private key SK HSM i , If the recovered g a is correct, BC j is authenticated by the HSM i . Then, with a and g b , HSM i can compute the shared session key where H : {0, 1} * → Z * q is a secure cryptographic hash function, and computes the HMAC signature using K ij as the key on i, j, t i , and msg i to form the Hash-based Message Authentication Code HMAC kij (i j t i msg i ). Finally, HSM i sends (g b , i) to BC j to authenticate HSM i .
After receiving (g b , i), BC j authenticates HSM i and then computes K ij = H(i j (g a ) b ) with the known g a and b.

Authentication Part
In the Related Work (Section 2), we mentioned that the authentication scheme in [21] is not sufficiently stringent because the only authentication key may be leaked. Therefore, we adopt an authentication protocol based on the Diffie-Hellman key-establishment protocol proposed in [19] between HSM and BC . The specific processes are depicted in Figure 3.

Upward Transmission
After the authentication process between HSM i and BSM j is complete, the HSM i transmits the message packet upward to BSM j . The specific transmission process is depicted in Figures 3 and 4. • HSM HSM selects a random number ɑ, b ∈ ℤ * from a positive integer in prime order. Let =< > be a group of prime numbers. Given g , HSM computes ENC (i ∥ j ∥ t ∥ g ) (where t is the current time slot) and transmits it to BC .

• BC
After receiving ENC (i ∥ j ∥ t ∥ g ), BC first decrypts it with its private key, SK , to verify the freshness of t . Then, it sends an encrypted response consisting of g ENC (i ∥ j ∥ t ∥ g ∥ g ) to HSM .

• HSM
After receiving ENC (i ∥ j ∥ t ∥ g ∥ g ) from BC , HSM first verifies the freshness of t .
Then, it recovers g and g using its private key SK , If the recovered g is correct, BC is authenticated by the HSM . Then, with a and g , HSM can compute the shared session key K = H(i ∥ j ∥ (g ) ), where H: {0,1} * → Ζ * is a secure cryptographic hash function, and computes the HMAC signature using K as the key on i, j, t , and msg to form the Hash-based Message Authentication Code HMAC (i ∥ j ∥ t ∥ msg ). Finally, HSM sends (g , i) to BC to authenticate HSM .

• BC
After receiving (g , i), BC authenticates HSM and then computes K = H(i ∥ j ∥ (g ) ) with the known g and b.

Upward Transmission
After the authentication process between HSM and BSM is complete, the HSM transmits the message packet upward to BSM . The specific transmission process is depicted in Figures 3 and  4. • HSM HSM sends ENC (i ∥ j ∥ t ∥ msg ∥ HMAC ) to BC .
• BC BC decrypts ENC (i ∥ j ∥ t ∥ msg ∥ HMAC ) with SK , verifies the freshness of t , and • HSM i HSM i sends ENC pk BC j (i j t i msg i HMAC kij ) to BC j .
• BC j BC j decrypts ENC pk BC j (i j t i msg i HMAC kij ) with SK BC j , verifies the freshness of t i , and recomputes kij and HMAC kij based on i, j, t i , and msg i to verify the sender and the integrity of msg i . If it is not the same as the one attached, it requires the transmission to be resent. After receiving all the messages from its child nodes, the BC j aggregates all m E 1H u i into E 2H u j and decrypts all E 1p r i with SK BC . Finally, it sums up the plaintexts and encrypts the summation using its public key PK CC into E 2p r j to form the regional requirement. Therefore, the transmitted message packet from BC j to NGW k can be represented as msg j = < ID j , Len, E 2p r j , E 2H u j >. BC j reserves the individual power requirement ciphertext <E

Authentication and Communication in BC, NGW, and CC
CC pre-sends the parameter s as the shared key for the BC, NGW and CC during the initiation stage.
• BC j BC j computes the HMAC signature HMAC s (j k t j msg j ) using the system master secret s as the key on j, k, and t j and encrypts the message with the public key PK NGW k . Then it transmits the message to the corresponding NGW k .
The NGW k , upon receiving ENC PK NGW k (j k t j HMAC s (j k t j msg j ), first verifies the freshness of t j and then re-computes HMAC s (j k t j msg j )). When the decrypted message equals the received one, it decrypts ENC PK NGW k with SK NGW k to obtain msg j . , and E 2p r j denotes the total regional power requirement for BC j . Then, it computes the HMAC signature HMAC s (k CC t k msg k ) using the system master secret s and encrypts it with the public key PK cc . Finally, it transmits the aggregate message to the CC.  Figure 4.

Power Distribution Generation
The CC decrypts p groups of <E 2p r 1 . . . E 2p r n > into p groups of <S 1 , S 2 , . . . , S n > (where S i is the ith regional station requirement summation). Then, the CC combines it with E H u to generate p groups of <R 1 , R 2 , . . . , R n > (where R i is the ith regional power distribution ratio). Next, it encrypts p groups of <R 1 , R 2 , . . . , R n > with PK BC and sends them to the p NGWs , respectively. The NGW relays the ratios to each BC. BC j decrypts the ratio ciphertext with SK BC j and retrieves the previously stored r m > from its database. BC j decrypts these values and computes m users' power distribution <D 1 , D 2 , . . . , D m > (D i = r i ·R j ) (where r i is the individual requirement plaintext) and encrypts them into <E 1 , E 2 , . . . , E m > (where E i is the ciphertext of D i with SK HSM i ). Then, it transmits them to every HSM. HSM i decrypts the power distribution message using its private key and obtains its power distribution for the next time slot.

Security Analysis
In this section, through a security analysis, we show that the proposed EPPRD achieves all the security goals defined in Section 3.2 and finally we prove EPPRD's security using the plaintext indistinguishability game.

Mutual Authentication and Data Integrity
In EPPRD, HSM i encrypts g a with BC j 's public key, which ensures that only BC j can recover g a if the employed public key system is secure. Using the same reasoning, g b is only received by real HSM i if the public key encryption technique is secure. After HSM i receives g a , BC j is authenticated by HSM i because only the real BC j can send g a to HSM i . Thus, the scheme provides mutual authentication among GWs and the CC.
The randomly generated shared key K ij ensures the data authentication and integrity between HSM and BC, because an external or internal attacker (of an HSM or NGW) has no authority to access other node's databases to transmit invalid data. In [21], if the pre-sent shared key s is compromised by an attacker, that attacker may be authenticated by BC with s and launch a man-in-the-middle attack. In contrast, in our scheme, even if the shared key K ij is compromised, the attacker still cannot be authenticated by the BC or NGW and the secrecy of previous keys remains intact because our authentication scheme provides perfect forward secrecy.

Protection against Eavesdropping Attack
The confidentiality of our scheme is based on the RSA and Paillier encryption algorithms. During authentication, g a and g b are encrypted with RSA encryption between HSM i and BC j . In upward transmission, the power consumption message is aggregated using Paillier encryption, and the requirement message is concatenated and encrypted with RSA encryption PK CC .
An attacker located in a HAN can eavesdrop on the communication flow between HSM and BC. However, even if the attacker eavesdrops on the ciphertext E 1p r i from HSM i to BC, he cannot recover the individual requirement from HSM i without the private key of BC j , and the encrypted individual consumption E 1H u i cannot be decrypted without the private key of the CC, because the Paillier encryption's semantic security resists chosen plaintext attacks.
Similarly, even if an attacker eavesdrops on the communication flow between BC j and the NGW, he cannot obtain the regional requirement and consumption sum other than the individual data, because the regional requirement and consumption sums (E 2p r j and E 2H u j ) can only be decrypted using the private key of the CC.

Protection against Internal Attack
There are two possible avenues for internal attacks in the semi-honest model in EPPRD. One is the communication flow between a HSM and a BC and the other is the communication flow between a BC, NGW, and the CC. In the first, messages are intentionally eavesdropped and stored by curious internal participants such as the NGW or another HSM. However, they cannot obtain the individual measurements because they lack the private keys of the BC and CC. The second communication flow may be intentionally eavesdropped on and stored by curious internal participants such as an HSM. However, using this approach, the attacker can only obtain the regional requirement sum and aggregated consumption. Even if he were to have access to the private key of CC, he would not be able to decipher the individual requirement and consumption values. Therefore, the proposed scheme provides not only confidentiality but also integrity.

Protection against Replay and Man-in-the-Middle Attack
Not only the ciphertext ENC BC j (i j t i g a ) during authentication but also the ciphertext ENC K ij (i j t i HMAC K ij (i j msg i ) in each transmission all contain freshly generated time stamps. Therefore, parties to the communication first verify the freshness of the time stamp and then verify that it is the same time stamp present in the encrypted message. In this way, EPPRD can resist replay attacks.
Consider the communication flow between HSM i and BC j . After receiving the g a sent by the BC j , the HSM i can authenticate the BC j . Even if an attacker were to impersonate the BC j or HSM i , he cannot be authenticated because of the RSA encryption and HMAC signature. Therefore, EPPRD can resist a man-in-the-middle attack.

Security Proof
Since the BC is highly trusted, the security notion of EPPRD focuses mainly on the semi-honest aggregator NGW and HSM. In what follows, we further analyze whether the collusion of the NGW and the compromised HSMs affects the leakage of other users' privacy, especially requirement and consumption plaintext. The security of EPPRD is based on the cryptosystem and security notion of Paillier. Theorem 1. Assume semi-honest adversary ADV corrupts the aggregator NGW and at most n − 1 nodes (n is the total number of HSM in a local region), then ADV cannot infer any privacy of other uncompromised users. EPPRD achieves security.
To demonstrate that EPPRD can maintain the plaintext of requirement and consumption, we use the plaintext indistinguishability game described below.

•
Setup: The challenger initializes the smart meters set to participant aggregation process. The challenger generates their keys including public and private keys during the secret key generation phase in Section 5.1 and gives the public keys to the adversary. • Queries: ADV can make "compromise" queries for private keys or plaintext to users. It can compromise at most n − 1 meters. The challenger returns the private key and plaintext of compromised smart meters. ADV may also compromise the aggregator NSM and receives the aggregation from the challenger.
The advantage of ADV in attacking the scheme is defined as follows: ADV ADV denotes the indistinguishability advantage of ADV. In what follows, we prove the advantage is zero.
Proof: Let us assume the n − 1 nodes are all compromised except for HSM i ; if the extreme case satisfies the security then it also holds for other cases. We prove that ADV cannot infer the requirement and consumption plaintext of HSM i , even if ADV compromises the aggregator NGW and n − 1 HSMs.
The ADV can compromise the NGW and n − 1 HSMs in th query phase and the challenger gives access to the measurement of compromised users or aggregated measurement in NGW as described in Section 5.2: E 2p r j = E 2p (r 1 + . . . In Equation (3), r i refers to the requirement plaintext of HSM i and u i refers to the consumption plaintext of HSM i in Equation (4). Assume the HSM i is the only smart meter that is not compromised by ADV, so the other nodes' requirement and consumption do not contribute to the security; Equation (3) can also be written as E 2p Equation (5) is encrypted with PK CC , and the ADV does not know SK cc , so it cannot learn about r i . Similarly, Equation (4) can be written as Equation (4) can be written as Equation (6) according to the addition homomorphic property of Paillier; however, the ADV still cannot learn about u i even if E 2H (u i ) can be inferred because of the cryptographic measurement.
From Equations (5) and (6), we can conclude that the ADV cannot correctly infer the requirement and consumption plaintext even if it compromises the aggregator NGW and at most n − 1 HSMs. So the security of HSM i can be guaranteed.

Performance Analysis
A SG communication system has resource constraints and stringent security requirement that make it difficult to perform computation-intensive operations such as symmetric public cryptographic operations. Furthermore, limited communication bandwidth may lead to delays or latency. Therefore, we analyze our scheme in terms of the communication volume, computational overhead, and delay time.
We fix the number of users at 1 million. The number of NGWs is 50, there are 100 BCs, and we vary the number of HSMs per BC from 1 to 200 with a step size of 20 to study the impact of the numbers of HSMs on communication, computational overhead, and memory consumption. To accommodate the highly frequent need for DS communications in SGs, we first adopt a HAN message transmission interval of 10 s, denoted by ∆, for validating the above performance analysis. Furthermore, we investigate the impact of different ∆ values on communication. Considering the same cryptography and similar authentication platforms, we compare the following two schemes performance with ours.

•
The no-consumption aggregation scheme. In this scheme, the BC receives publicly encrypted consumption messages E P u rather than homomorphic encryption from all the HSMs and transmits them to the CC via NGW. The CC decrypts the encrypted messages based on its public key successively rather than decrypting the message once as in our scheme. As we can imagine, the no-consumption aggregation scheme requires excessive communication overhead, and its security is not rigorous enough because it lacks the protection of homomorphic encryption.

•
The no-regional-requirement aggregation scheme in [21]. In this scheme, the homomorphically encrypted power requirements estimating the future time period and commitments are transmitted upward. In these messages, the commitment is the evidence of the user power consumption plan at each billing period. Thus, it obtains the same requirement object for individual users as in our scheme. However, as described in the Related Works (Section 2), we propose some improvements from various perspectives.

Communication Volume
In the hierarchical architecture, we evaluate the communication volume performance from encryption and authentication overheads by considering the handshake step and the traffic payload through every GW during transmission.
We assume the time slot size and the GWs identities occupy 128 bits/16 bytes, while RSA encryption is 1024 bits/128 bytes for a public/private key pair, the size of the Hash MAC is set to 16 bytes based on MD5 and Paillier encryption is 4096 bits/512 bytes. Therefore, the encryption overhead for the consumption and requirement messages of HSM i is 512 and 128 bytes, respectively, and can be completed during the preprocessing phase.
Encrypting ENC pk BCj (i j t g a ) requires 176 bytes and ENC pk HSMi (i j t j g a g b ) requires 304 bytes. Transmitting (g b , i) requires 144 bytes, and ENC kij (i j t i msg i HMAC(i j t i msg i )) requires 1392 bytes. Therefore, the total size of transmissions during communication between one HSM i and BC j is 2016 bytes in our scheme. In contrast, ENC PK BC (E i H i C i HMAC S (E i H i C i )) between one HSM i and BC j in the scheme in [21] requires 1424 bytes when m = 1 (m is the time period in [21]). Obviously, our communication overhead between HSM i and BC j is larger than that of the scheme in [21], as shown in Figure 3. Figure 5 plots a comparison of the communication required by our scheme and [21] between any BC and all HSMs. The regional overhead at a BC in our scheme exceeds that of the scheme in [21] slightly due to our more rigorous authentication process during the handshake period and the additional aggregated consumption report.  [21] between any BC and all HSMs. The regional overhead at a BC in our scheme exceeds that of the scheme in [21] slightly due to our more rigorous authentication process during the handshake period and the additional aggregated consumption report.
However, as shown in Figure 6, this additional overhead has little effect on the overall communications compared with [21]. In fact, Figure 6 shows that our scheme outperforms [21] in terms of overall communications overhead. Figure 6a shows how the communication overhead of [21] changes when the number of HSMs increases. The total system communication overhead increases significantly, and approaches 30 GB when the number of HSM per BC nears 200 and number of BCs nears 100. In contrast, as shown in Figure 6b, the amplitude of growth for our proposed scheme is not large and the total communication never exceeds 11 MB. This result occurs because every transmitted upward message includes a requirement message, an individual commitment packet and a hash packet in [21], but our scheme stores these in the BC and performs an upward transmission of only one regional requirement and one encrypted consumption message. Moreover, our scheme uses symmetric encryption, while the scheme in [21] adopts asymmetric encryption among GWs and the CC, which requires more bytes. The results show that the regional requirement storage/aggregation at the BC and the power consumption aggregation play an important role in reducing the total communication cost and memory consumption. (a) scheme in [21] (b) our scheme However, as shown in Figure 6, this additional overhead has little effect on the overall communications compared with [21]. In fact, Figure 6 shows that our scheme outperforms [21] in terms of overall communications overhead. Figure 6a shows how the communication overhead of [21] changes when the number of HSMs increases. The total system communication overhead increases significantly, and approaches 30 GB when the number of HSM per BC nears 200 and number of BCs nears 100. In contrast, as shown in Figure 6b, the amplitude of growth for our proposed scheme is not large and the total communication never exceeds 11 MB. This result occurs because every transmitted upward message includes a requirement message, an individual commitment packet and a hash packet in [21], but our scheme stores these in the BC and performs an upward transmission of only one regional requirement and one encrypted consumption message. Moreover, our scheme uses symmetric encryption, while the scheme in [21] adopts asymmetric encryption among GWs and the CC, which requires more bytes. The results show that the regional requirement storage/aggregation at the BC and the power consumption aggregation play an important role in reducing the total communication cost and memory consumption. (a) scheme in [21] (b) our scheme

Computation Overhead
In this evaluation, we ignore the computation overhead involved in the preparation phase because it can be performed offline. The following performance evaluation and analysis combine the authentication and privacy preservation processes.
We performed the experiments based on the FriendlyARM [29] library and the library from [21] using a computer with a processor running at 2.5 GHz, 4 MB of RAM 4 MB and 1 MB of flash memory. The results not only consider message authentication but also privacy preservation issues, although our requirement may be higher than that required for conventional smart meters.
To consume the 160 MH of the BC, we expanded the experimental values by 16 times, including the encryption and decryption time. We adopted the Paillier cryptosystem with 512 bits of modulus and at least 1 − 2 −64 certainty of prime generation for homomorphic encryption and decryption [28] and for RSA we used a 1024-bit key for asymmetric encryption, decryption [30]. For AES we used a 128-bit key for symmetric encryption and decryption and the MAC is based on the RIPEMD-128 MD5 algorithm, which provides greater resilience against collision and pre-image attacks than does MD5 [31]. The time cost of all primitive operations is listed in Table 2. Based on the test results, we compare the computation cost. Encrypting (g a ) with PK BC j for transmission to BC j requires RSA encryption and Diffie-Hellman encryption successively, namely, 2 × T aenc , and decrypting encrypted messages from BC j requires one T adec , Computing K ij and HMAC K ij requires one T hash and one T hmac . Therefore, one intact authentication process requires 2 × T aenc + T adec + T hash + T hmac . Encrypting (i j t i HMAC K ij (i j msg i ) requires one T aenc . In addition to encrypting the consumption and requirement message packet, denoted as E H and E p , respectively, requires T henc + T aenc which can be done during the preprocessing stage, Therefore, the total time required is 3T aenc + T adec + T hash + T hmac .

•
For BC j : The authentication process between HSM i and BC j costs the BC j 2 × T adec + T aenc + T hash + T hmac .
Decrypting a message requires one T adec and decrypting m E 1p r i requires one (m − 1) × T adec for summation. Encrypting the summation into E 2p r j requires one T senc , and aggregating all the E 1H u i messages into E 2H u j takes (m − 1) × T mul . Then, BC j takes one T hmac to generate the HMAC signature and one T senc to encrypt (j k t j HMAC s (j k t j msg j )) with shared key s. Therefore, the total time is T aenc + T senc + (m + 2)T adec + T hash + 2T hmac + (m − 1)T mul .

•
For NGW k : Re-computing the HMAC signature HMAC s (j k t j msg j ) requires one T hmac , Decrypting ENC s (j k t j HMAC s (j k t j msg j )) requires one T sdec . Upon receiving an E 2H u j and aggregating it into E 3H u k takes (n − 1) × T mul . Then, forming HMAC s (k CC msg k ) takes T hmac , and encrypting it with PK s for the CC takes one T senc . Therefore, the total time is T senc + T sdec + 2T hmac + (n − 1)T mul .

•
For the CC: Upon receiving ENC s (K CC t K HMAC S (k CC msg k ), re-computing the HMAC signature takes one T hmac and decrypting it takes one T sdec . Then, aggregating p groups of E 3H u k takes p × T mul , and it takes one T hdec to receive the total aggregation. Therefore, the total time is T sdec + T hmac + p × T mul + T hdec .
According to the above time analysis, combined with the other two schemes, Figure 7 shows the communication time delay of the three schemes in the power requirement stage. Figure 7a shows the change of regional time delay for the three schemes as the number of HSMs increases. When the number of users is 20, the employed method in [21] costs 2.17 s, the no-usage aggregation scheme costs 2.78 s, and our scheme costs 3.18 s. The increasing amplification of the three schemes is 12.1%, 13.4%, and 18.65%, respectively. As we can see, the computation overhead of our scheme is always higher slightly than the other two, and its amplification increases slightly because bidirectional authentication costs more time during the handshake period than the other two schemes. Moreover, the other two schemes do not require regional decryption at the BC for each individual requirement. The no-consumption aggregation scheme adopts the same authentication process as ours, but it does not require the decryption process; therefore, its communication time delay is less than ours. The time delay of the scheme in [21] is smallest because it uses only one session key throughout the authentication process and does not require a bidirectional session key generation process between an HSM and a BC, nor does it require the decryption process at the BC. Therefore, [21] has the least communication time delay cost at a BC of the three schemes.
(a) Regional delay time at a BC (b) Total delay time Figure 7. The computation time delay for our scheme, the scheme in [21] and the no-consumption aggregation scheme in the power requirement stage.

Memory Occupancy Rate for Different Transmission Intervals ∆
The memory required by our scheme and the scheme in [21] with different numbers of users at varying transmission intervals is shown in Figure 8. When ∆ is 15 s and 10 s, our scheme's memory usage is relatively small. It increases slightly (but no more than 0.16) as the number of users increases. When ∆ is 15 s, the scheme in [21] requires relatively little memory, and it is similar to our scheme when ∆ is 5 s but has an obviously rising trend: eventually, its memory requirement become overwhelming and use up all the available memory. This result demonstrates our scheme's good performance. This is due to the fact that BCs share lots of processing queue and aggregate fewer processing queue at CC.

Affected Householders with Different Numbers of Attackers
Finally, we show the strictness of our bidirectional authentication by performing attacks in an SG network. We assume that householders are affected if the message they transmit upward is not the same as the one received by the BC, NGW, and CC. We evaluate our authentication by varying the number of SG attackers. We assume the number of households can be up to 3 million, while the Our Scheme with Δ=15s Scheme in [21] with Δ=15s Our Scheme with Δ=10s Scheme in [21] with Δ=10s Our Scheme with Δ=5s Scheme in [21] with Δ=5s Figure 7. The computation time delay for our scheme, the scheme in [21] and the no-consumption aggregation scheme in the power requirement stage.
However, as shown in Figure 7b, regional delay time has little effect on the overall time delay of our scheme compared with the other two schemes. On the contrary, it shows that our scheme outperforms the other two schemes in terms of the overall time delay overhead. Figure 7b shows a comparison of the total delay time. As shown, the total delay increases as the number of users increases; however, the amplification is obviously different. When the number of users is 20, [21] costs 6.2 s, the no usage aggregation scheme costs 8.4 s, and our scheme costs 5.1 s. However, when the number of users is 200, the delay time of the other two increases significantly: the delay time for no-usage aggregation scheme approaches 34.8 and that of [21] is 25.1, while our scheme costs only 16.2 s, which indicates that the effect of regional time delay is insignificant compared to the time delay during the overall communications between the BC, NGW, and CC. It is easy to conclude that the time delay in the latter communication occurs mainly from decryption. In the no-usage aggregation, the individual usage data is not decrypted at the BC; instead, it is transmitted upward to the CC via NGW; consequently, the CC must decrypt all the individual usage data, which costs much time. The scheme in [21] does not aggregate regional requirement data; therefore, it needs to be decrypted by the CC, which is costlier than our scheme. Assume that m, n, and p stand for the number of HSMs per BC, BCs per NGW, and NGWs per CC, respectively. Then, the decryption time complexity degree is o(2 m·n·p) in the no-consumption aggregation scheme, o(m·n·p) in [21], and ours is o(m·n + n·p) during communication between the BCs, NGWs, and the CC. Moreover, from Figure 7b, we can conclude that the regional decryption and aggregation approach involves less total time delay compared to the decryption amounts required in the other two schemes.
Therefore, we can conclude that regional requirement storage and homomorphic aggregation play important roles in reducing the total communication and computation overhead.

Memory Occupancy Rate for Different Transmission Intervals ∆
The memory required by our scheme and the scheme in [21] with different numbers of users at varying transmission intervals is shown in Figure 8. When ∆ is 15 s and 10 s, our scheme's memory usage is relatively small. It increases slightly (but no more than 0.16) as the number of users increases. When ∆ is 15 s, the scheme in [21] requires relatively little memory, and it is similar to our scheme when ∆ is 5 s but has an obviously rising trend: eventually, its memory requirement become overwhelming and use up all the available memory. This result demonstrates our scheme's good performance. This is due to the fact that BCs share lots of processing queue and aggregate fewer processing queue at CC.
usage is relatively small. It increases slightly (but no more than 0.16) as the number of users increases. When ∆ is 15 s, the scheme in [21] requires relatively little memory, and it is similar to our scheme when ∆ is 5 s but has an obviously rising trend: eventually, its memory requirement become overwhelming and use up all the available memory. This result demonstrates our scheme's good performance. This is due to the fact that BCs share lots of processing queue and aggregate fewer processing queue at CC.

Affected Householders with Different Numbers of Attackers
Finally, we show the strictness of our bidirectional authentication by performing attacks in an SG network. We assume that householders are affected if the message they transmit upward is not the same as the one received by the BC, NGW, and CC. We evaluate our authentication by varying the number of SG attackers. We assume the number of households can be up to 3 million, while the Our Scheme with Δ=15s Scheme in [21] with Δ=15s Our Scheme with Δ=10s Scheme in [21] with Δ=10s Our Scheme with Δ=5s Scheme in [21] with Δ=5s Figure 8. Comparison of memory use between our scheme and [21] with different ∆ values.

Affected Householders with Different Numbers of Attackers
Finally, we show the strictness of our bidirectional authentication by performing attacks in an SG network. We assume that householders are affected if the message they transmit upward is not the same as the one received by the BC, NGW, and CC. We evaluate our authentication by varying the number of SG attackers. We assume the number of households can be up to 3 million, while the number of attackers reaches 5000 at most. We also introduce man-in-the-middle attacks into the SG network and study the number of affected householders with a randomly generated authentication key and a fixed authentication key at the BCs. We distribute 10 attackers into 10 different BCs. As shown in Figure 9, the number of affected householders continues to increase as the number of attackers increases in both the scheme from [21] and our scheme; however, the number of affected householders in our scheme is always lower than the number affected in [21], which does not use a randomly generated authentication key. This result demonstrates that using a randomly generated authentication key would strengthen the privacy preservation of the scheme [21] and help prevent man-in-the-middle attacks. It also shows that our scheme reduces the impact of man-in-the-middle attacks. number of attackers reaches 5000 at most. We also introduce man-in-the-middle attacks into the SG network and study the number of affected householders with a randomly generated authentication key and a fixed authentication key at the BCs. We distribute 10 attackers into 10 different BCs. As shown in Figure 9, the number of affected householders continues to increase as the number of attackers increases in both the scheme from [21] and our scheme; however, the number of affected householders in our scheme is always lower than the number affected in [21], which does not use a randomly generated authentication key. This result demonstrates that using a randomly generated authentication key would strengthen the privacy preservation of the scheme [21] and help prevent man-in-the-middle attacks. It also shows that our scheme reduces the impact of man-in-the-middle attacks. Figure 9. Affected householders in our scheme and the scheme in [21] with different numbers of attackers.

Conclusions
In this paper, we proposed an efficient privacy-preserving power requirement and distribution aggregation scheme for Smart Grid (EPPRD). It is a novelty individual power requirement and distribution scheme while preserving user privacy with a light bidirectional authentication and encryption technique. The existing schemes mostly focus on the total preserving authentication technique or do not consider the whole communication and computation overhead. We locate BC as a regional aggregation station in BAN to aggregate and transmit regional power total and store individual requirement. On the other hand, power consumption in the last time slot is the power Number of Attacker Number of Affected Householders scheme in [21] our scheme Figure 9. Affected householders in our scheme and the scheme in [21] with different numbers of attackers.

Conclusions
In this paper, we proposed an efficient privacy-preserving power requirement and distribution aggregation scheme for Smart Grid (EPPRD). It is a novelty individual power requirement and distribution scheme while preserving user privacy with a light bidirectional authentication and encryption technique. The existing schemes mostly focus on the total preserving authentication technique or do not consider the whole communication and computation overhead. We locate BC as a regional aggregation station in BAN to aggregate and transmit regional power total and store individual requirement. On the other hand, power consumption in the last time slot is the power distribution reference in the next time slot; its homomorphic encryption scheme together with the authentication scheme ensures the rigorous privacy protection and data integrity. Experiments demonstrate that it plays an important role in reducing computation and communication overhead. In future work, we will further explore low-cost cryptographic algorithms against various attacks and study light cryptographic and authentication algorithms in case there is no trusted model for distributed communication network.