Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT

Hongbin, Fan; Zhi, Zhou

doi:10.3390/math11010214

Open AccessArticle

Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT

by

Fan Hongbin

^1,2 and

Zhou Zhi

^3,*

¹

College of Computer and Artificial Intelligence, Xiangnan University, Chenzhou 423000, China

²

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

³

Institute of Intelligent Structural Systems, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(1), 214; https://doi.org/10.3390/math11010214

Submission received: 1 October 2022 / Revised: 7 December 2022 / Accepted: 27 December 2022 / Published: 1 January 2023

(This article belongs to the Special Issue Advanced Deep Learning and Mathematical Modeling for Reliability, Security and Privacy Problems in Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The extensive application of the Internet of Things in the industrial field has formed the industrial Internet of Things (IIoT). By analyzing and training data from the industrial Internet of Things, intelligent manufacturing can be realized. Due to privacy concerns, the industrial data of various institutions cannot be shared, which forms data islands. To address this challenge, we propose a privacy-preserving data aggregation federated learning (PPDAFL) scheme for the IIoT. In federated learning, data aggregation is adopted to protect model changes and provide data security for industrial devices. By utilizing a practical Byzantine fault tolerance (PBFT) algorithm, each round selects an IIoT device from each aggregation area as the data aggregation and initialization node, and uses data aggregation to protect the model changes of a single user while resisting reverse analysis attacks from the industrial management center. The Paillier cryptosystem and secret sharing are combined to realize data security, fault tolerance, and data sharing. A security analysis and performance evaluation show that the scheme reduces computation and communication overheads while guaranteeing data privacy, message authenticity, and integrity.

Keywords:

federated learning; IIoT; privacy-preserving; PBFT

MSC:

68T07

1. Introduction

The extensive application of the Internet of Things in the industrial field has formed the industrial Internet of Things (IIoT), which greatly improves industrial productivity and efficiency [1]. Smart devices in the IIoT generate vast amounts of perceptual data that contain sensitive information, such as how devices are performing. Because these data play a vital role in troubleshooting and early security warnings, it is crucial to protect data privacy in the IIoT [2]. For example, on March 7, 2019, an attack on the computer system of Venezuela’s Guri hydroelectric power plant caused power outages in 21 states, affecting 30 million people. Thus, the privacy protection and security issues of the IIoT have attracted the attention of many researchers [3,4].

The industrial data generated by the IIoT are used to develop artificial intelligence (AI) through machine learning techniques, which can be applied to various fields such as smart communities, smart healthcare, smart grids, smart cities, and autonomous driving [5]. Analyzing and training data from the industrial Internet of Things to realize intelligent manufacturing [6] promotes the transformation of traditional industries into smart industries. Machine learning algorithms can help with the real-time data learning and model training of intelligent devices in the industrial Internet of Things. Smart devices constantly collect data, which helps learning models update in time. Therefore, machine learning is used for real-time processing and decision making after IIoT data collection. Traditional machine learning algorithms relying on central servers or third parties for data learning can lead to serious privacy issues, such as a data leakage that contains sensitive user information. These algorithms need to upload data to the central server to train the learning model. Although machine learning can improve prediction accuracy by training models, it is prone to privacy leaks because the raw data of the smart device are directly trained. Therefore, there is an urgent need for distributed machine learning in the IIoT to be able to train learning models while protecting data privacy.

Inspired by this issue, Google proposed federated learning (FL), a cryptographic distributed machine learning technique for data privacy protection and model collaborative training [7]. FL trains high-quality AI models by averaging local model updates aggregated from many IIoT devices. It eliminates the need for direct access to local data, reducing the risk of privacy breaches. Because FL collects data from many IIoT devices to train the AI model, the training model accuracy is significantly improved, which is not possible when training centralized traditional machine learning algorithms with fewer data [8]. Inspired by these attractive features, research activities that combine FL and the IIoT have been carried out [9,10,11]. However, these works only focused on some IIoT applications, but ignore IIoT privacy protection. In the federated learning model transmission process, there is also the risk of privacy information disclosure [12]. Therefore, it is challenging to design federated learning schemes that meet the privacy protection and information security requirements of the IIoT.

To address the above challenges, we propose a privacy-preserving data aggregation federated learning (PPDAFL) scheme for the IIoT, which combines secret sharing with federated learning to aggregate IIoT data.

The main contributions of this paper are as follows:

(1): A privacy-preserving, multidimensional data aggregation scheme based on federated learning is proposed. This scheme enables multidimensional data aggregation for the IIoT. For federated learning, data aggregation is adopted to protect local model changes and resist reverse analysis attacks.
(2): Through the PBFT algorithm, initialization and aggregation nodes are selected to achieve independence from any trusted entity. Secret sharing is used to achieve fault tolerance. Even if $K - L$ IIoT devices collude with each other or fail, the aggregation information from the IIoT can be obtained.
(3): Compared with existing schemes, PPDAFL has lower communication and computational overheads, faster execution, and higher efficiency. It is well suited for data aggregation in the IIoT.

The rest of this paper is organized as follows: The related work is discussed in Section 2. Section 3 describes the preliminary concepts, and Section 4 presents the system model. The PPDAFL scheme is described in Section 5, a security analysis is conducted in Section 6, and the model’s performance is evaluated in Section 7. Section 8 is a summary of the paper.

2. Related Work

In recent years, the study of federal learning and the industrial Internet of Things has attracted much attention.

Initially proposed by McMahan et al. [13], FL is a decentralized, private data training method. The robustness of FL was verified and FL was further explored. Subsequently, some researchers combined federated learning with homomorphic encryption to protect user privacy. Zhang et al. [14] proposed a non-IID FL optimization target alignment and method of unified feature learning to improve the FL model performance. The feature representation difference between clients is reduced by the opponent module, and the two consensus losses are applied from two perspectives to reduce the inconsistency of the optimization goal. In [15], a multi-secret sharing, secure aggregation scheme based on the fast Fourier transform was proposed, which has robustness, efficient communication, and a low computational cost. The authors in [16] presented an efficient secret-sharing, privacy-preserving data aggregation scheme based on FL. The scheme achieves fault tolerance and attack resistance, and the model of user training is aggregated without disclosing individual user information. These schemes [14,15,16] can resist most attacks, but the first two schemes fail to defend against reverse attacks and the latter cannot resist replay attacks. Zhou et al. [17] proposed a federated learning scheme combining Shamir secret sharing and homomorphic cryptography for multiparty entity matching. A logistic regression algorithm is used to realize the user’s exit in vertical federated learning and privacy protection model training. However, this scheme assumes that the key generation center is the trusted entity that distributes the secret key.

In recent years, some researchers have investigated federated learning and applied it to the IIoT. Liu et al. [18] designed a deep anomaly detection framework based on federated learning and a convolutional neural network/long short-term memory model. The scheme protects data privacy through FL, and the LSTM learning model detects device failures in the IIoT. In this scheme, the cloud aggregator is a trusted authority that initializes and distributes secret keys to the system. This solution assumes that the cloud aggregator is a trusted entity. Zhang et al. [19] proposed a deep reinforcement learning-assisted FL algorithm for IIoT device selection to improve model accuracy, verifying the effectiveness of the proposed algorithm. The scheme sets the central server as a trusted authority, which selects a device in the IIoT as the initialization and aggregation node. In [20], the authors proposed a multiple data aggregation scheme of differential privacy and homomorphic encryption, which was designed as a federated learning IIoT application model based on blockchain, and achieved model sharing and data sharing while protecting the privacy of the data. However, the scheme does not achieve a secure exchange of industrial data. Qu et al. [21] designed a decentralized framework of big-data-driven cognitive computing by combining federated learning and blockchain for Industry 4.0 networks. An optimization model was developed using the improved Markov decision-making process to conduct poisoning attacks. Fu et al. [22] proposed a privacy-protected, verifiable federated learning VFL for the industrial Internet of Things. Lagrange interpolation was used to set interpolation points to verify the correctness of the aggregation gradient. The scheme assumes that the public key generator is a trusted entity, which is used to initialize and generate keys and parameters. Qu et al. [23] proposed a federated blockchain-based learning scheme, which allows local learning updates to be sent to terminal devices via a blockchain-based global learning model, and they are verified by miners. The PoW consensus mechanism of blockchain is adopted to realize decentralization and autonomous machine learning.

At present, although there have been some achievements in the research on the IIoT frameworks and schemes based on federated learning, these ideas are still in the initial stage and have many deficiencies.

3. Preliminary Concepts

3.1. Federated Learning

FL is a distributed machine learning model that trains local models on the user side and aggregates them with a central manager [24]. The large-scale distributed training of local data iteratively aggregates the data into a global model, which helps protect the privacy of the data [25]. FL keeps private data local and only shares the model parameters. FL allows devices to collaboratively train shared models without having to exchange their local private data [26].

FL model training is divided into three phases [27]. First, data are locally collected and trained. Second, the local model is uploaded and aggregated. Finally, the aggregation forms a global model, which is then distributed to local devices.

3.2. Secret Sharing

Shamir proposed a secret-sharing scheme based on the Lagrange interpolation formula [1], which requires a distributor to divide the secret into n pieces and secretly send one copy to each user. This formula requires at least L (N ≥ L) secret holders to participate in the reconstruction of the secret, as L − 1 participants cannot reconstruct the secret. Therefore, the problem of sharing secrets among n users is solved [28].

The following polynomial was chosen to split a secret:

F (y) = α + p_{1} y + p_{2} y^{2} + \dots + p_{L - 1} y^{L - 1}

(1)

where

α

is a secret, and L is a threshold value.

The following formula can be obtained via the Lagrange interpolation polynomial:

F (y) = \sum_{k = 1}^{L} (\prod_{j = 1, j \neq k}^{L} \frac{y_{j} - y}{y_{j} - y_{k}}) F (y_{k})

(2)

γ_{y_{j}} = \prod_{k = 1, k \neq j}^{L} \frac{y_{k}}{y_{k} - y_{j}}

(3)

Then,

α

is calculated as follows:

\begin{array}{l} \sum_{j = 1}^{L} F (y_{j}) γ_{y_{j}} & = \sum_{k = 1}^{L} F (y_{k}) \cdot γ_{y_{k}} \\ = \sum_{k = 1}^{L} F (y_{k}) \cdot (\prod_{j = 1, j \neq k}^{L} \frac{y_{j}}{y_{j} - y_{k}}) \\ = \sum_{k = 1}^{L} (\prod_{j = 1, j \neq k}^{L} \frac{y_{j}}{y_{j} - y_{k}}) F (y_{k}) \end{array}

(4)

Let y = 0 in Equation (2).

\begin{array}{l} F (0) & = \sum_{k = 1}^{L} (\prod_{j = 1, j \neq k}^{L} \frac{y_{j} - 0}{y_{j} - y_{k}}) F (y_{k}) \\ = \sum_{k = 1}^{L} (\prod_{j = 1, j \neq k}^{L} \frac{y_{j}}{y_{j} - y_{k}}) F (y_{k}) \end{array}

(5)

Let y = 0 in Equation (1).

F (0) = a

Equation (4) equals Equation (5); thus, the following equation holds:

α = F (0) = \sum_{j = 1}^{L} F (y_{j}) γ_{y_{j}}

(6)

3.3. Paillier Cryptosystem

The Paillier cryptosystem [29] is an additive homomorphic, asymmetric encryption algorithm with the following steps:

(1): Key generation. Randomly select two primes $p$ and $q$ , and calculate $λ = l c m (p - 1, q - 1)$ . Define $L (x) = \frac{x - 1}{N}$ , where $N = p q$ . Choose a generator $g \in Z_{N^{2}}^{*}$ , and calculate $μ = {(L (g^{λ} \mod N^{2}))}^{- 1} \mod n$ . The public and private keys are $(N, g)$ and $(λ, μ)$ , respectively.
(2): Encryption. Choose $r \in Z_{N}^{*}$ , $\gcd (r, N) = 1$ , given a message $X \in Z_{N}$ . The ciphertext is calculated as $C = g^{X} \cdot r^{N} \mod N^{2}$ .
(3): Decryption. Decrypt with the private key using $D e c (C) = L (C^{λ} \mod N^{2}) \cdot μ \mod N$ .

4. System Model

4.1. Communication Model

As shown in Figure 1, the system consists of an industrial control center (ICC) and many IIoT devices (IIDs).

(1): IID: IIDs collect IIoT data using P2P communication. An IID is selected from the aggregation area as the data aggregation and initialization node (DN) by using the PBFT algorithm. IIDs may stop reporting data from attacks or failures. They are assumed to be honest but curious.
(2): ICC: The ICC reads the aggregated IIoT data. Assume the ICC is not trusted. The ICC is assumed to be honest but curious. The HMC is considered to be untrusted, and the SD is semitrusted.

4.2. Adversary Model

In our model, the ICC is untrusted, and the IID is honest but curious. Data collection and utilization can bring economic value to IIDs and the ICC; therefore, the IIDs in each aggregation area need to comply with the agreement. We have the following assumptions:

(1): The ICC wants to infer the information of a single user from the collected data.
(2): The IID does not tamper with its industrial data. Still, it is curious about the private information of others, and tries to collude with other IIDs in the aggregated area to infer the information of others.

4.3. Design Goals

(1): Privacy protection. The scheme protects against internal and external attacks. No entity can access industrial data from a single IID.
(2): Data security. Industrial data can be securely aggregated. Even if the ciphertext of the IIoT data aggregation collected by IIDs is intercepted, the IIoT data of a single IID cannot be recovered.
(3): Fault tolerance. If an IID is maliciously attacked or fails to collect IIoT data, the utility of the system is significantly compromised. Even if some IIDs do not work properly, the system can still aggregate IIoT data from other IIDs.

5. The Proposed Scheme

In this section, we introduce the PPDAFL scheme. The notations are listed in Table 1.

5.1. Initialization

Suppose each aggregation area has

K

S D s

recorded as a set,

S_{g} = {S D_{1}, S D_{2}, \dots, S D_{K}}

. Some

I I D s

have been attacked or are malfunctioning, and thus, they do not participate in IIoT data aggregation. Suppose there are at least L SDs on the line participating in aggregation. These

I I D s

constitute

S_{o n} \subseteq S_{g}

. Each round of IIoT data aggregation uses the PBFT algorithm to select an

I I D

from

S_{g}

as the data aggregation and system initialization node (

D N

).

D N

runs the Paillier cryptosystem to generate

(q, g_{0}, G_{1}, G_{2}, e)

,

e : G_{1} \times G_{1} \to G_{2}

, and calculates key pairs

{(N, g), (λ, μ)}

,

g_{0} \in G_{1}

,

g_{1} \in Z_{N^{2}}^{*}

.

D N

chooses a sequence

\vec{b} = (b_{1}, b_{2}, \dots, b_{M})

, where

b_{i} \in Z^{+}

, for

i = 1, 2, \dots, M

. Then,

D N

calculates

(g_{1}, g_{2}, \dots, g_{M})

, where

g_{i} = g^{b_{i}}

, for

i = 1, 2, \dots, M

.

D N

chooses three hash functions

H_{0}, H_{1} and H_{2}

, where

H_{0}, H_{1} : {0, 1}^{*} \to G_{1}, H_{2} : {0, 1}^{*}

\to Z_{N}^{*}

.

D N

publishes the parameter

{q, g, g_{0}, G_{1}, G_{2}, e, N, (g_{1}, g_{2}, \dots, g_{L}), H_{0}, H_{1}, H_{2}}

.

5.2. Advertise Keys (Round 0)

IID:

(1): Request the ICC to update data.
(2): $I I D_{i j}$ selects $s_{i j} \in Z_{q}^{*}$ to compute the public key $P_{i j} = s_{i j} \cdot g_{0}$ ; then, it sends $P_{i j}$ to the ICC.

ICC:

(1): $I C C$ collects at least L messages from $I I D_{i j}$ of the i-th aggregation area.
(2): Set the number of IIDs to $K$ and the threshold to L when dividing each aggregation area.
(3): Broadcast the list of received public keys to the IIDs in $S_{o n}$ .

5.3. Share Generation (Round 1)

IID:

(1): Receive global parameters from the ICC. Verify that $| S_{o n} | \geq L$ .
(2): $I I D_{i j}$ generates its polynomial $F (y_{j}) = α + p_{1} y_{j} + p_{2} y_{j}^{2} + \dots + p_{L - 1} y_{j}^{L - 1}$ , $γ_{y_{j}} = \prod_{k \neq j}^{L} \frac{y_{k}}{y_{k} - y_{j}}$ , then sends $F (y_{j}) γ_{y_{j}} ∥ T s$ to $I C C$ .

ICC:

(1): Forward the received shares to the IIDs in $S_{o n}$ .

5.4. Generate Ciphertext and Signature Verification (Round 2)

IID:

(1): $I I D_{i j}$ generates $d_{i j}$ at $T s$ , and computes $H_{2} (T) : {0, 1}^{*} \to Z_{N}^{*}$ ; next, it selects $r \in Z_{N}^{*}$ to generate the ciphertext:

$C_{i j} = g_{i}^{d_{i j}} \cdot r^{N} \cdot H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}} \mod N^{2}$

(7)
(2): $I I D_{i j}$ generates the signature $σ_{i j} = s_{i j} \cdot H_{1} (C_{i j} ∥ P_{i j} ∥ H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s))$ .
(3): $I I D_{i j}$ sends $C_{i j} ∥ P_{i j} ∥ H_{0} (F (y_{j}) γ_{y_{j}} ∥ T s) ∥ σ_{i j}$ to $I C C$ and $D N$ .

ICC:

(1): The ICC verifies $K$ signatures after receiving $C_{i j} ∥ P_{i j} ∥ H_{0} (F (y_{j}) γ_{y_{j}} ∥ T s) ∥ σ_{i j}$ . If $e (σ_{i j}, g_{0}) = e (H_{1} (C_{i j} ∥ P_{i j} ∥ H_{0} (F (y_{j}) γ_{y_{j}} ∥ T s)), P_{i j})$ , then the validation is successful; otherwise, it fails. If it holds, the signature is valid. Next, send the signature verification results to $D N$ , and $D N$ will accept $I I D_{i j}$ ’s ciphertext. Otherwise, the DN will not accept the ciphertext of $I I D_{i j} .$
(2): To make the verification more efficient, after receiving $C_{i j} ∥ P_{i j} ∥ H_{0} (F (y_{j}) γ_{y_{j}} ∥ T s_{j}) ∥ σ_{i j}$ , obtain $the I C C$ batch verification signature.

$\begin{array}{l} e (\sum_{i = 1}^{M} \sum_{j = 1}^{K} σ_{i j}, g_{0}) & = e (\sum_{i = 1}^{M} \sum_{j = 1}^{K} s_{i j} \cdot H_{1} (C_{i j} ∥ P_{i j} ∥ H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)), g_{0}) \\ = \prod_{i = 1}^{M} \prod_{j = 1}^{K} e (s_{i j} \cdot H_{1} (C_{i j} ∥ P_{i j} ∥ H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)), g_{0}) \\ = \prod_{i = 1}^{M} \prod_{j = 1}^{K} e (H_{1} (C_{i j} ∥ P_{i j} ∥ H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)), s_{i j} \cdot g_{0}) \end{array}$

(8)
(3): Forward the batch signature verification results to $D N$ .

High-level view between IID and ICC in our scheme can be seen in Figure 2.

5.5. Aggregate Ciphertext and Reconstruct Secret (Round 3)

IID:

(1): $D N$ aggregates the ciphertext of IIDs.

$\begin{array}{l} C & = \prod_{i = 1}^{M} \prod_{j = 1}^{K} g_{i}^{d_{i j}} \cdot r^{N} \cdot H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}} \\ = \prod_{i = 1}^{M} g_{i}^{\sum_{j = 1}^{K} d_{i j}} \cdot r^{N} \cdot H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}}^{} {\mod N}^{2} \\ = g_{1}^{\sum_{j = 1}^{K} d_{1 j}} \cdot g_{2}^{\sum_{j = 1}^{K} d_{2 j}} \cdot \dots \cdot g_{M}^{\sum_{j = 1}^{K} d_{M j}} \cdot r^{N} \cdot H_{2} {(T s)}^{0} {\mod N}^{2} \\ = g^{b_{1} \sum_{j = 1}^{K} d_{1 j} + b_{2} \sum_{j = 1}^{K} d_{2 j} + \dots + b_{M} \sum_{j = 1}^{K} d_{M j}} \cdot r^{N} {\mod N}^{2} \end{array}$

(9)
(2): $D N$ sends $C$ to $I C C$ .

ICC:

(1): After the signatures are verified, $I C C$ chooses L shares of $F (y_{j}) γ_{y_{j}}$ from the received $K$ shares of $F (y_{j}) γ_{y_{j}}$ to reconstruct the secret.

$β = F (0) = \sum_{j = 1}^{L} F (y_{j}) γ_{y_{j}} . Let β = 0; thus, \sum_{j = 1}^{L} F (y_{j}) γ_{y_{j}} = 0 .$

5.6. Ciphertext Decryption (Round 4)

Firstly,

I C C

uses (λ, μ) to decrypt the aggregated data from the IIoT.

\begin{array}{l} X = D e c (C) & = \frac{L (g^{(b_{1} \sum_{j = 1}^{K} d_{1 j} + b_{2} \sum_{j = 1}^{K} d_{2 j} + \dots + b_{M} \sum_{j = 1}^{K} d_{M j}) λ} {\mod N}^{2}^{})}{L (g^{λ} {\mod N}^{2}^{})} \mod N \\ = b_{1} \sum_{j = 1}^{K} d_{1 j} + b_{2} \sum_{j = 1}^{K} d_{2 j} + \dots + b_{M} \sum_{j = 1}^{K} d_{M j} \end{array}

(10)

Finally, by using Algorithm 1,

D N

can recover the total power consumption data of each aggregation area

(X_{1}, X_{2}, \dots, X_{L})

, where

X_{i} = \sum_{j = 1}^{K} d_{i j}

represents the data aggregation value for the i-th aggregation area.

Algorithm 1. Recover aggregated reports for each aggregation area.

1. procedure
Input:

\vec{b} = (b_{1}, b_{2}, \dots, b_{M})

and

X

Input:

(X_{1}, X_{2}, \dots, X_{M})

2. Set

D_{M} = X

3. for

w = M to 2 do

4.

D_{w - 1} = D_{w} \mod b_{w}

5.

X_{w} = \frac{D_{w} - D_{w - 1}}{b_{w}} = \sum_{j = 1}^{w} d_{w j}

6. end for

7.

X_{1} = D_{1} = \sum_{j = 1}^{w} d_{1 j}

8. return

(X_{1}, X_{2}, \dots, X_{M})

9. end procedure

6. Security Analysis

The schemes of [14,15,16,17] were compared with the security of our scheme, as shown in Table 2.

6.1. Privacy Preservation

Attackers are divided into internal attackers and external attackers. The internal attacker is

I I D s

or

I C C

of the aggregation area, and the external attacker is an entity outside the aggregation area.

Resist internal attacks.

I I D_{i k}

(

k \neq j

) could not successfully extract

d_{i j}

from

C_{i j}

, because it does not know

H_{0} {(T s)}^{E (y_{j}) γ_{y_{j}}}

. Even if the malicious IIoT users obtained

H_{0} {(T s)}^{E (y_{j}) γ_{y_{j}}}

of

I I D_{i j}

, they still could not obtain the plaintext of

I I D_{i j}

because

λ

is not known. The ICC does not know

H_{2} {(T s)}^{E (y_{j}) γ_{y_{j}}}

of

I I D_{i j}

and cannot derive the data of

S D_{i j}

. The ICC can obtain the aggregated data of

I I D s

. Therefore, the PPDAFL scheme can defend against internal attacks.

Defend against external attacks. When an external attacker invades

I I D_{i j}

, the ciphertext

C_{i j} = g_{i}^{d_{i j}} \times r^{N} \times H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}} \mod N^{2}

of

I I D_{i j}

can be obtained. Since the attacker does not know the shared keys of

L - 1

I I D s

and

λ

, the attacker cannot obtain the plaintext of

I I D_{i j}

. Therefore, the PPDAFL scheme can resist external attacks.

6.2. Data Integrity

The PPDAFL scheme adopts BLS to sign for the aggregate data and private data of the IIDs.

For the message

C_{i j} ∥ P_{i j} ∥ H_{0} (F (y_{j}) γ_{y_{j}}) ∥ T s ∥ σ_{i j}

sent by

I I D_{i j}

, the ICC first checks

P_{j} and H_{1} (T s_{j} ∥ E (y_{j}) γ_{y_{j}})

, and then verifies the integrity of the message by checking whether

e (\sum_{i = 1}^{M} \sum_{j = 1}^{K} σ_{i j}, g_{0}) = \prod_{i = 1}^{M} \prod_{j = 1}^{K} e (H_{1} (C_{i j} ∥ P_{i j} ∥ H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)), s_{i j} \cdot g_{0}) is

established. The individual elements of the

I I D_{i j}

verification message involve batch validation, and any operation will result in unequal validation. Therefore, the scheme ensures the integrity of

I I D_{i j}

data.

6.3. Fault Tolerance

If

I I D_{i j}

fails or is under attack, it cannot send IIoT data to

D N

since

D N

only knows which group an

I I D

belongs to in the aggregation region. The ICC uses

H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)

to find

I I D_{i j}

while masking its identity.

I C C

compares the hash table constituted by

H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)

with other complete groups to find

I I D_{i j}

. Then, it selects an

I I D

from

H_{1} (F (y_{j}) γ_{y_{j}} ∥ T s)

to replace

I I D_{i j^{'}}

. Therefore,

I I D_{i j}

does not consider the data of

I I D_{i j}

.

I C C

chooses L shares of

F (y_{j}) γ_{y_{j}}

from the received

K - 1

shares of

F (y_{j}) γ_{y_{j}}

to reconstruct the secret.

Let us assume

I I D_{i j^{'}}

(

1 \leq j^{'} \leq K

) has failed to transmit

d_{j^{'}}

to

S N

; then,

S N

aggregates the ciphertext.

\begin{array}{l} C & = \prod_{i = 1}^{M} \prod_{j = 1, j \neq j^{'}}^{K} g_{i}^{d_{i j}} \cdot r^{N} \cdot H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}} \\ = \prod_{i = 1}^{M} g_{i}^{\sum_{j = 1, j \neq j^{'}}^{K} d_{i j}} \cdot r^{N} \cdot H_{2} {(T s)}^{F (y_{j}) γ_{y_{j}}}^{} {\mod N}^{2} \\ = g^{b_{1} \sum_{j = 1}^{M} d_{1 j} + \dots + b_{i} \sum_{j = 1, j \neq j^{'}}^{M} d_{i j^{'}} + \dots + b_{K} \sum_{j = 1}^{M} d_{K j}} \cdot r^{N} {\mod N}^{2} \\ = g^{b_{1} \sum_{j = 1}^{M} d_{1 j} + b_{2} \sum_{j = 1}^{M} d_{2 j} + \dots + b_{K} \sum_{j = 1}^{M} d_{K j}} \cdot r^{N} {\mod N}^{2} \end{array}

(11)

I C C

uses the private key

(λ, μ)

to decrypt

C

.

D e c (C) = b_{1} \sum_{j = 1}^{M} d_{1 j} + b_{2} \sum_{j = 1}^{M} d_{2 j} + \dots + b_{K} \sum_{j = 1}^{M} d_{K j}

(12)

I C C

obtains the aggregated data

X

of all IIDs except

I I D_{i j^{'}}

. As a result, it is obvious that

I C C

can obtain the right aggregation results. Therefore, the PPDAFL scheme implements fault tolerance.

7. Performance Evaluation

7.1. Computation Complexity

Table 3 lists the comparisons of the execution time of the schemes from [16,17] and the PPDAFL scheme.

According to Table 3, the PPDAFL scheme take less time to calculate than the schemes of [16,17]. In the PPDAFL scheme, due to the use of batch verification, the signature verification is only 1/

K

of the scheme from [16] or [17]. Since the schemes from [16,17] generate N pairs of keys, the cost of PPDAFL is half of that of the scheme from [16] or [17]. The number of key agreements for PPDAFL is 0, whereas that of the scheme from [16] or [17] is

2 K M

times greater. The number of secret sharing for the PPDAFL scheme is equal to

K M times

, and that of the scheme from [16] is equal to

3 K M

times.

As shown in Figure 3, the calculation overhead of the scheme from [17] is the highest, whereas that of the PPDAFL scheme is the lowest. With the increase in the number of smart devices in each aggregation area and the number of aggregation areas, the advantages of the PPDAFL scheme are more obvious.

7.2. Communication Overhead

Since the data received by

I C C

is the same as that sent by

S D s

, and the data obtained by

S D s

is the same as that sent by

I C C

, only the transmission communication cost of

S D s

and

I C C

is considered. Without losing the generality, the comparisons in Table 4 and Figure 4, Figure 5 and Figure 6 consider only one of

the M

aggregation regions.

Table 4 lists the communication costs for the PPDAFL scheme, the scheme of [16], and the scheme of [17]. The cost of each round of communication for the PPDAFL scheme is compared to the EPPDA scheme in Table 4.

In Table 4, the communication cost of PPDAFL is less than that of the schemes from [16,17], especially in Round 1. Compared with the schemes from [16,17], PPDAFL schemes have a lower latency and higher practicability.

K

represents the number of IIDs in each aggregation area, and

R

indicates the data length. As shown in Figure 4, the communication overhead of the scheme from [17] is the highest, whereas that of the PPDAFL scheme is the lowest.

Figure 5 shows the communication comparison between PPDAFL and the schemes from [16,17] at

R

= 256 bits. Figure 6 shows the comparison of the communication costs between PPDAFL and the schemes of [16,17] when

K

= 200. As shown in Figure 5 and Figure 6, PPDAFL has a higher communication efficiency than the schemes of [16,17]. As the number of IIDs or the length of the data increases, PPDAFL saves more communication overhead.

8. Conclusions

A multidimensional data aggregation scheme for privacy protection in the IIoT based on federated learning is proposed. In each round of data aggregation, the PBFT consensus algorithm selects an IID as the aggregation and initialization node. In federated learning, data aggregation is used to protect the model changes of a single user and resist reverse analysis attacks from the industrial management center. Our design does not require any trusted entity. The analyses are presented to prove that the proposed scheme meets all the design goals. Our scheme has low computation and communication overheads. Through performance simulation and security analysis, it is proved that our scheme meets the objectives of privacy protection, data integrity, and fault tolerance. In the future, we will focus on the combination of FL, blockchain, and the Stackelberg game to explore balancing privacy and efficiency.

Author Contributions

Conceptualization, F.H.; methodology, Z.Z.; formal analysis, F.H.; writing—original draft, F.H.; writing—review and editing, F.H. and Z.Z.; investigation, F.H.; validation, Z.Z.; funding acquisition, F.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangxi Key Laboratory of Trusted Software, grant number KX202048; the Scientific Research Project of the Education Department of Hunan Province, “Research on privacy protection data aggregation and power theft detection for 5G smart grid”, grant number 22C0551; the 2021 Chenzhou Science and Technology Bureau Science and Technology Innovation Platform Special Project (Chen Caijiaozhi (2022), No. 172), Research on Intelligent Energy Efficiency Operation and Maintenance of Electric power System; and the General Scientific Research Project of Hunan Provincial Department of Education, “Application research on automatic identification and classification of antler mushrooms based on deep learning”, grant number 20C1193.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to thank the reviewers for their valuable comments and suggestions concerning this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Inform. 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
Zhao, Q.; Qi, X.; Hua, M.; Liu, J.; Tian, H. Review of the Recent Blackouts and the Enlightenment; IET: London, UK, 2020; Volume 2020, pp. 312–314. [Google Scholar]
Tange, K.; De Donno, M.; Fafoutis, X.; Dragoni, N. A systematic survey of industrial Internet of things security: Requirements and fog computing opportunities. IEEE Commun. Surv. Tutor. 2020, 22, 2489–2520. [Google Scholar] [CrossRef]
Raymond, K.K.; Stefanos, G.; Park, J.H. Cryptographic solutions for industrial Internet-of-things: Research challenges and opportunities. IEEE Trans. Ind. Inform. 2018, 14, 3567–3569. [Google Scholar]
Xu, L.D.; He, W.; Li, S. Internet of things in industries: A survey. IEEE Trans. Ind. Inform. 2014, 10, 2233–2243. [Google Scholar] [CrossRef]
Zhang, X.; Fang, F.; Wang, J. Probabilistic solar irradiation forecasting based on variational Bayesian inference with secure federated learning. IEEE Trans. Ind. Inform. 2020, 17, 7849–7859. [Google Scholar] [CrossRef]
Bahrami, S.; Chen, Y.C.; Wong, V.W.S. Deep reinforcement learning for demand response in distribution networks. IEEE Trans. Smart Grid 2021, 12, 1496–1506. [Google Scholar] [CrossRef]
Zhao, B.; Liu, X.; Chen, W.N.; Liang, W.; Zhang, X.; Deng, R.H. PRICE: Privacy and reliability-aware real-time incentive system for crowdsensing. IEEE Internet Things J. 2021, 8, 17584–17595. [Google Scholar] [CrossRef]
Ur Rehman, M.H.; Dirir, A.M.; Salah, K.; Damiani, E.; Svetinovic, D. TrustFed: A framework for fair and trustworthy cross-device federated learning in IIoT. IEEE Trans. Ind. Inform. 2021, 17, 8485–8494. [Google Scholar] [CrossRef]
Yin, X.; Zhu, Y.; Hu, J. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
Sun, W.; Lei, S.; Wang, L.; Liu, Z.; Zhang, Y. Adaptive federated learning and digital twin for industrial internet of things. IEEE Trans. Ind. Inform. 2020, 17, 5605–5614. [Google Scholar] [CrossRef]
Hao, M.; Li, H.; Luo, X.; Xu, G.; Yang, H.; Liu, S. Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inform. 2019, 16, 6532–6542. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agueray, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Zhang, L.; Luo, Y.; Bai, Y.; Du, B.; Duan, L.Y. Federated learning for non-iid data via unified feature learning and optimization objective alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4420–4428. [Google Scholar]
Kadhe, S.; Rajaraman, N.; Koyluoglu, O.O.; Ramchandran, K. Fastsecagg: Scalable secure aggregation for privacy-preserving federated learning. arXiv 2020, arXiv:2009.11248. [Google Scholar]
Song, J.; Wang, W.; Gadekallu, T.R.; Cao, J.; Liu, Y. Eppda: An efficient privacy-preserving data aggregation federated learning scheme. IEEE Trans. Netw. Sci. Eng. 2022, in press. [Google Scholar] [CrossRef]
Zhou, Z.; Tian, Y.; Peng, C. Privacy-preserving federated learning framework with general aggregation and multiparty entity matching. Wirel. Commun. Mob. Comput. 2021, 2021, 6692061. [Google Scholar] [CrossRef]
Liu, Y.; Garg, S.; Nie, J.; Zhang, Y.; Xiong, Z.; Kang, J.; Hossain, M.S. Deep anomaly detection for time-series data in industrial IoT: A communication-efficient on-device federated learning approach. IEEE Internet Things J. 2020, 8, 6348–6358. [Google Scholar] [CrossRef]
Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT. IEEE Trans. Ind. Inform. 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
Jia, B.; Zhang, X.; Liu, J.; Zhang, Y.; Huang, K.; Liang, Y. Blockchain-enabled federated learning data protection aggregation scheme with differential privacy and homomorphic encryption in IIoT. IEEE Trans. Ind. Inform. 2021, 18, 4049–4058. [Google Scholar] [CrossRef]
Qu, Y.; Pokhrel, S.R.; Garg, S.; Gao, L.; Xiang, Y. A blockchained federated learning framework for cognitive computing in industry 4.0 networks. IEEE Trans. Ind. Inform. 2020, 17, 2964–2973. [Google Scholar] [CrossRef]
Fu, A.; Zhang, X.; Xiong, N.; Gao, Y.; Wang, H.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2022, 18, 3316–3326. [Google Scholar] [CrossRef]
Qu, Y.; Gao, L.; Luan, T.H.; Xiang, Y.; Yu, S.; Li, B.; Zheng, G. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet Things J. 2020, 7, 5171–5183. [Google Scholar] [CrossRef]
Islam, T.U.; Ghasemi, R.; Mohammed, N. Privacy-Preserving Federated Learning Model for Healthcare Data. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; IEEE: New York, NY, USA, 2022; pp. 281–287. [Google Scholar]
Zhao, L.; Jiang, J.; Feng, B.; Wang, Q.; Shen, C.; Li, Q. Sear: Secure and efficient aggregation for byzantine-robust federated learning. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3329–3342. [Google Scholar] [CrossRef]
Pillutla, K.; Kakade, S.M.; Harchaoui, Z. Robust aggregation for federated learning. IEEE Trans. Signal Process. 2022, 70, 1142–1154. [Google Scholar] [CrossRef]
Zhao, B.; Fan, K.; Yang, K.; Wang, Z.; Li, H.; Yang, Y. Anonymous and privacy-preserving federated learning with industrial big data. IEEE Trans. Ind. Inform. 2021, 17, 6314–6323. [Google Scholar] [CrossRef]
Yu, K.; Tan, L.; Yang, C.; Choo, K.K.R.; Bashir, A.K.; Rodrigues, J.J.; Sato, T. A blockchain-based shamir’s threshold cryptography scheme for data protection in industrial internet of things settings. IEEE Internet Things J. 2022, 9, 8154–8167. [Google Scholar] [CrossRef]
Boneh, D.; Gentry, C.; Lynn, B.; Shacham, H. Aggregate and verifiably encrypted signatures from bilinear maps. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, 4–8 May 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 416–432. [Google Scholar]

Figure 1. System model.

Figure 2. High-level view between IID and ICC in our scheme.

Figure 3. Comparison of computational cost [16,17].

Figure 4. Comparison of communication overhead [16,17].

Figure 5. Comparison of communication overhead when

R

= 256 bits [16,17].

Figure 5. Comparison of communication overhead when

R

= 256 bits [16,17].

Figure 6. Comparison of communication overhead when

K

= 200 [16,17].

Figure 6. Comparison of communication overhead when

K

= 200 [16,17].

Table 1. Notations.

Symbol	Quantity
$g_{0}, g_{1}$	A generator of G
$I C C$	Industrial control center
$I I D_{i j}$	The j-th IIoT device in the i-th aggregation area
$D N$	Data aggregation and system initialization node
$d_{i j}$	Industrial data of $S D_{i j}$
$X$	Aggregated industrial data
$X_{i}$	The i-th aggregated industrial data
$H_{0}, H_{1}$	Hash functions: $H_{0}, H_{1} : {0, 1}^{*} \to G_{1}$
$H_{2}$	Hash functions: $H_{2}$ : ${0, 1}^{}$ → $Z_{N}^{}$
$K$	Number of smart devices in each aggregation area
$M$	Number of aggregation areas
$R$	Data length
‖	Concatenation operation

Table 2. Comparison of features between PPDAFL and other related schemes.

Features	[14]	[15]	[16]	[17]	PPDAFL
Privacy preservation	√	√	√	√	√
Fault tolerance	×	√	√	√	√
Dropout	√	√	√	√	√
Round efficiency	√	×	√	√	√
No expensive operations	×	√	√	√	√
No trusted entity	×	×	√	×	√
Resist reverse attacks	×	×	√	√	√
Resilience against reply attacks	×	×	×	×	√
Defense against collusion attacks	×	×	√	√	√

Table 3. Computational expense comparison.

Algorithm	[16]	[17]	PPDAFL
Key generation	$2 K M$	$2 K M$	$K M$
Secret sharing	$K M$	$3 K M$	$K M$
Encryption	$K M$	$K M$	$K M$
Signature generation	$K M$	$K M$	$K M$
Signature verification	$K M$	$K M$	$M$
Key agreement	$2 K M$	$0$	$0$
Secret reconstruction	$M$	$K M$	$M$
Decryption	$M$	$M$	$M$

Table 4. Communication overhead comparison.

Round	[16]		[17]		PPDAFL
Round	IID (User)	ICC (Server)	IID (User)	ICC (Server)	IID (User)	ICC (Server)
0	$2 R$	$0$	$2 R$	$0$	$R$	$0$
1	$(K - 1) R$	3 $K R$	$K R$	3 $K R$	$R$	$K R$
2	$R$	$(K - 1) R$	$3 R$	$K R$	$K R$	$R$
3	$R$	$0$	$K R$	$R$	$K R$	$R$
4	$R$	$(K - 1) R$	$K R$	$R$	$R$	$0$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hongbin, F.; Zhi, Z. Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT. Mathematics 2023, 11, 214. https://doi.org/10.3390/math11010214

AMA Style

Hongbin F, Zhi Z. Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT. Mathematics. 2023; 11(1):214. https://doi.org/10.3390/math11010214

Chicago/Turabian Style

Hongbin, Fan, and Zhou Zhi. 2023. "Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT" Mathematics 11, no. 1: 214. https://doi.org/10.3390/math11010214

APA Style

Hongbin, F., & Zhi, Z. (2023). Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT. Mathematics, 11(1), 214. https://doi.org/10.3390/math11010214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Privacy-Preserving Data Aggregation Scheme Based on Federated Learning for IIoT

Abstract

1. Introduction

2. Related Work

3. Preliminary Concepts

3.1. Federated Learning

3.2. Secret Sharing

3.3. Paillier Cryptosystem

4. System Model

4.1. Communication Model

4.2. Adversary Model

4.3. Design Goals

5. The Proposed Scheme

5.1. Initialization

5.2. Advertise Keys (Round 0)

5.3. Share Generation (Round 1)

5.4. Generate Ciphertext and Signature Verification (Round 2)

5.5. Aggregate Ciphertext and Reconstruct Secret (Round 3)

5.6. Ciphertext Decryption (Round 4)

6. Security Analysis

6.1. Privacy Preservation

6.2. Data Integrity

6.3. Fault Tolerance

7. Performance Evaluation

7.1. Computation Complexity

7.2. Communication Overhead

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI