EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity

Yang, Fan; Sun, Yi; Gao, Qi; Chen, Xingyuan

doi:10.3390/electronics13010075

Open AccessArticle

EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity

¹

School of Cryptographic Engineering, Information Engineering University, Zhengzhou 450001, China

²

State Key Laboratory of Cryptology, Beijing 100094, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(1), 75; https://doi.org/10.3390/electronics13010075

Submission received: 16 November 2023 / Revised: 6 December 2023 / Accepted: 14 December 2023 / Published: 23 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

The emergence of mobile edge computing (MEC) has facilitated the development of data caching technology, which enables application vendors to cache frequently used data on the edge servers close to the user, thereby providing low-latency data access services. However, in an unstable MEC environment, the multi-replica data cached by different edge servers is prone to corruption, making it crucial to verify the consistency of multi-replica data on different edge servers. Although the existing research realizes data integrity verification based on the cooperation of multiple edge servers, the integrity proof generated by multiple copies of data is the same, which has low verification efficiency and is vulnerable to attacks such as replay and replace. To address the above issues, based on homomorphic hash and sampling algorithms, this paper proposes an efficient and lightweight multi-replica integrity verification algorithm, which has significantly less storage cost and computational cost and can resist forgery and replay and replace attacks. Based on the verification algorithm, this paper further proposes a multi-replica edge data integrity collaborative audit scheme EDI-C based on the reputation model. EDI-C realizes the efficient collaborative audit of multiple edge servers in a distributed discrete environment through an incentive mechanism to avoid the trust problem of both sides caused by centralized audit. Also, it supports batch auditing of multiple copies of original data files at the same time through parallel processing and data block auditing technology, which not only significantly improves the verification efficiency but also realizes the accurate location and repair of corrupted data at the data block level. Finally, the security analyses and performance evaluation show the security and practicability of EDI-C. Compared with the representative schemes, the results show that EDI-C can ensure the integrity verification of cached data more efficiently in an MEC environment.

Keywords:

mobile edge computing; data consistency verification; reputation model; homomorphic hash; incentive mechanism

1. Introduction

With the rapid development of 5G networks and the wide application of mobile Internet-of-Things (IoT) devices, the traditional cloud computing architecture can no longer satisfy the demand of mobile IoT services, and the unpredictable data access latency between geographically distributed users and remote clouds has become the bottleneck of network services [1]. To overcome this challenge, a new computing architecture, Edge Computing (EC) [2], has emerged and matured gradually, and it has become one of the key technologies for 5G. Derived from cloud computing, EC is an information transmission and processing technology on the user side. The emergence of EC extends cloud-based data caching services to the edge side of the network, and an edge caching data service framework is shown in Figure 1. The app vendor (AV) can cache commonly used data of latency-sensitive applications to edge servers (ESs) close to the user’s geographic location, thereby reducing the service response time and providing more convenient and low-latency services to nearby users [3].

However, this emerging edge storage has brought about many unique security issues and challenges. Compared with traditional large-scale cloud computing environments, EC environments are more dynamic, discrete, and unstable. Since ESs are geographically distributed and deployed and cannot be always maintained internally like traditional cloud servers, ESs are not completely reliable, e.g., software anomalies or hardware failures can lead to corruption of cached data on ESs [4], and an attacker can tamper with cached data on ESs by injecting Trojan horses and viruses, which poses a security threat to the users accessing the corrupted data. Thus, it is urgent to ensure the integrity and consistency of multi-copy data on different ESs in a dynamic, discrete, and unstable environment.

Most existing research [4,5,6,7,8] on the problem of data integrity verification at the edge mainly extends the traditional schemes to the EC environment. Nevertheless, traditional centralized cloud data integrity verification schemes are not applicable to EC environments with large-scale distributed deployments, which makes the edge data integrity verification problem unique: edge caching systems are composed of highly distributed ESs that lack centralized control like cloud servers hosted in data centers; EC environments usually need to pick up massive data, and both the ES and AV need more efficient and lightweight verification schemes [9]; in real-world scenarios, ESs and mobile devices have limited storage and computational resources, and due to the variation in the resource levels of different edge nodes, the same and complex verification schemes are not suitable for the global edge nodes.

Recently, many studies have turned to collaborative verification between ESs to verify the integrity of edge data without the participation of AV, which can solve the trust problem of both sides in centralized verification. However, all ESs need to generate hash values based on the entire replica data as integrity proof, which results in excessive computational overhead. Meanwhile, since the correct integrity proof is the same globally, a malicious ES can cheat the verifier by replaying the proof that has already been verified.

For the integrity verification of multiple data copies on different edge servers in edge caching systems in highly discrete distributed EC environments with varying resource levels, this paper proposes an efficient reputation-model-based collaborative auditing scheme for edge data integrity, EDI-C, which dynamically coordinates interactive auditing among ESs in the system, and different edge nodes can perform different auditing tasks flexibly according to their available resources. The contributions of this paper are summarized as follows:

This paper proposes an efficient lightweight multi-copy data integrity verification algorithm based on homomorphic hashing and sampling algorithms. The algorithm only involves basic addition, multiplication, and modulo operations, and it does not need to compute the labels for each block of data, thereby effectively reducing the computation and communication cost; also, it can resist replay and replace attacks.
Based on the verification algorithm, this paper proposes a multi-replica edge data integrity collaborative audit scheme EDI-C based on the reputation model. EDI-C realizes efficient collaborative audit of multiple edge servers in a distributed discrete environment through an incentive mechanism, and it avoids the trust problem of both sides caused by centralized audit. Moreover, by using parallel processing and data block auditing technology, EDI-C can support the batch auditing of multiple copies of original data at the same time, which not only improves the verification efficiency but also realizes the accurate location and repair of corrupted data at the data block level.

2. Related Work

2.1. Traditional Cloud Storage Multiple-Replica Integrity Verification

For the integrity verification of outgoing packet data in remote untrustworthy servers, Ateniese et al. [10] first proposed the Provable Data Possession (PDP) model at the Computer and Communications Security conference in 2007. The PDP model uses the RSA signature algorithm to construct Homomorphic Verifiable Tags (HVT) to perform the integrity verification of data in remote untrustworthy servers by users without retrieving the data; meanwhile, it reduces computation and communication overheads through random sampling strategy and aggregation verification, making it possible to apply the model to verify the data holdings of large datasets. Meanwhile, Juels and Kaliski [11] developed the Proof of Retrievability (POR) model. The POR model achieves probabilistic proof of the file holding type by embedding “sentinels” in the file, and it provides data recovery by using redundant encoding of the data. These two types of verification models support probabilistic verification.

Storing multiple copies is a common strategy for improving reliability and availability in cloud storage. For important data files, users sometimes want to store multiple copies on a server to guarantee that some corrupted data copies can be still recovered from the remaining copies. Meanwhile, cloud service providers may sell the same storage space multiple times to gain more profit, but it actually stores only one copy but claims to have stored multiple copies on multiple servers. Curtmola et al. [12] provided the first implementation of multiple-replica provable data possession (MR-PDP) based on the PDP model, where HVTs are generated from the original data blocks during preprocessing and stored in the server. Nevertheless, this scheme is only suitable for static scenarios where each data replica is checked one by one, so it incurs much additional computation. Besides, there are many research schemes [13,14,15,16,17] based on authentication data structures such as Merkle Hash Tree (MHT) and Index-Hash Table (IHT), and they utilize homomorphic signatures for simultaneous verification of multi-copy outsourced data and dynamic update verification. To further enhance the efficiency and reduce the burden of certificate management, researchers have also proposed an identity-based scheme for multi-copy data integrity verification by adopting identity signature technology [18,19].

2.2. Edge Cache Data Integrity Verification

The emergence of the new cloud service architecture, EC, has facilitated the development of network caching systems for deploying application data at the edge [1,20], thereby providing low-latency cached data services to users near edge devices. However, in EC environments with highly unstable networks, faced with ESs with limited computing power and storage space, cached data are susceptible to intentional or unintentional corruption, and there is an urgent need for a methodology to efficiently check the integrity of the data cached on ESs by application providers.

Tong et al. [21] were the first to investigate the problem of edge data integrity verification, and they proposed two integrity-checking protocols for mobile Edge storage with the participation of third-party trusted centers, namely, ICE-basic protocol and ICE-batch. Specifically, the latter is a batch verification protocol based on the idea of aggregation verification, and it enables users to check data integrity on single and multiple edge cloud nodes. Recently, they proposed an efficient tag caching strategy to reduce the authentication communication cost [22]. Cui et al. [7] proposed a homomorphic tag-based framework called ICL-EDI for edge data integrity verification. Additionally, based on bilinear pairs and certificate-less signatures, Liu et al. [23] designed an edge that can resist key disclosure attacks while achieving privacy preservation. However, none of these studies [7,21,22,23] considered dynamic data verification and recovery of corrupted data. To this end, Qiao et al. [6] proposed a lightweight auditing scheme called EDI-SA, which reduces the verification computation overhead by using algebraic signatures [24]; also, they developed an improved sampling strategy that supports batch auditing and dynamic update verification. Subsequently, Ding et al. [8] proposed an improved edge data integrity verification scheme called EDI-DA, which designs a new data structure, indexed unidirectional linked tables, to support dynamic data operations, thereby improving the updating efficiency and practicality. However, the scheme cannot perform data recovery and still has some security issues such as forgery attacks.

The schemes [6,7,8,21,22,23] are all realized by migrating the traditional outsourced cloud data integrity verification schemes directly to the EC environment. However, edge data integrity verification is fundamentally different from traditional cloud data integrity verification in that both sides (AV and ES) of edge cloud cached data verification have the original data, whereas in traditional cloud data integrity verification, users no longer keep the original data locally after outsourcing it to the cloud server, and the PDP-based implementation of outsourced data integrity verification requires the generation of labels before the data is outsourced, with the user only retaining the labels locally as the basis for verification. Therefore, PDP-based authentication schemes are too complex to be applied to the integrity verification of edge data. Additionally, EC environments often need to access massive amounts of data, and both ES and AV need more efficient and lightweight verifiable solutions.

For the integrity verification of edge data in a targeted way, Li et al. [4] developed a sampling-based probabilistic detection method, EDI-V, to verify replica data integrity on ESs with limited computational resources, where the edge cloud only needs to generate specific sampling trees called variable Merkle hash tree (VMHT) for data integrity verification. However, in practice, the total number of edge data replicas is large, and it is not feasible to utilize a single hash tree to manage all replicas. Therefore, based on the idea of aggregate verification, another scheme called EDI-S was proposed in [5], which adopts elliptic curve signatures to generate integrity proofs for each replica. However, the scheme suffers from a large communication overhead for delivering verification messages and cannot locate and repair corrupted copies at the data block level.

Recently, many studies have turned to collaborative verification of edge nodes with each other to verify the integrity of edge data without the involvement of AVs, i.e., it can solve the trust problem between two parties in centralized verification while ensuring the fairness of verification without requiring a trusted third party. Li et al. [25] designed CooperEDI, a data integrity verification scheme based on a distributed consensus mechanism to form a decentralized self-managed edge caching system, where ESs can autonomously interact with each other to verify the integrity of cached copies without remote control from AVs. Then, they implemented EdgeWatch [26], a collaborative verification framework for edge data integrity using blockchain. Moreover, based on the principles of game theory, prospect theory, and reinforcement learning, Mitsis et al. [27] proposed a behavior and price-aware edge computing framework to solves the data offloading decision problem in a multi-user, multi-server, multi-access edge computing competitive environment, which is crucial for evaluating the importance and potential impact of contributions. However, in the scheme [25,26], all edge nodes need to generate hash values as integrity proofs based on the entire replica data, which leads to too much computational overhead. Moreover, the correct integrity proofs are the same for the whole globe, and malicious edge nodes conspire to replay already verified proofs to deceive the verifier.

The complex EC environment has become a bottleneck for providing secure and trustworthy edge caching data services. To this end, this paper aimed to design a collaborative auditing scheme for edge data integrity that is efficient and applicable to discrete distributed environments and can resist attacks by malicious edge nodes.

3. Preliminaries

3.1. System Model

As shown in Figure 2, a collaborative audit system model of edge data integrity in a distributed discrete environment was constructed in this paper. To adapt to the characteristics of a distributed EC environment, the management area was reasonably planned according to the geographical location. Through the collaborative audit and mutual verification between ESs in the same area, the roles of the auditor and the auditee were organically unified, thereby realizing efficient integrity verification of the data of the whole edge caching system. Also, this paper designed an incentive mechanism based on reputation, which can effectively eliminate the interest space of multi-party collusion adhering to the principle of game theory, so that ES can honestly audit each other. The model mainly involves two entities: the ES and the AV. Specifically, ES is deployed at the edge of the network (base stations and access points) close to the user’s geographic location and stores copies of commonly used and relevant data. The AV deploys the service data on the ES, which is responsible for supervising the audit process and managing the system, to ensure service quality by auditing the integrity of the data copies. Meanwhile, to enable the ES in the system to realize collaborative auditing under the incentive mechanism, the AV also needs to maintain the Audit Ledger and Credit Ledger. Specifically, the Audit Ledger contains the VerResults, all verification results collected by the AV during the audit phase, and the EDIList, a list of copy integrity verification results for globally participating ESs counted by the AV. The Credit Ledger contains the CreditList, a list of credit updated by AV after credit settlement based on the audit results, and the SettlementRecord, a record of the credit settlement process that ESs can use to prove when they challenge the results.

3.2. Fault Model

It is assumed that the AV is honest and trustworthy and performs audit tasks honestly. There may be dishonest nodes in the ESs: when some copies of data are corrupted or lost, dishonest ESs may replace or replay integrity certificates to pass the audit; honest ESs always process messages correctly and timely. These ESs are geographically dispersed and managed by different infrastructure providers, and it is difficult for an attacker to compromise a large number of data copies on these ESs simultaneously without being detected. Based on the reputation model [28] and game theory [29], consensus among helping participants in unreliable networks can be addressed. An underlying assumption is that no more than half of the participants fail at the same time. Based on this assumption, most participants in a distributed system can eventually reach a consensus when some participants temporarily fail. Therefore, this study makes the following assumption: in the edge cache system, no more than half copies of any data are corrupted at the same time.

Specifically, data copies stored on ESs (also known as edge data) are exposed to storage failures and three main attacks as described below.

Accidental failures. Failures such as hardware failures, software anomalies, and network attacks can lead to the corruption of edge data.
Forgery attacks. When some edge data are corrupted or lost, a dishonest ES may be able to forge an integrity proof in polynomial time with a non-negligible probability of passing audit proof verified by an honest verifier.
Replay attacks. A dishonest ES may be able to use a previously generated proof of correct data integrity to pass a new integrity audit.
Replace attacks. A dishonest ES may pass the integrity audit of an AV by replacing a corrupted block with another intact block stored by itself, or by intercepting an integrity proof generated by another ES as its own.

3.3. Design Goals

The design objectives of the program are listed below:

Correctness. The scheme should ensure that the AV can correctly utilize the verification equations to audit the integrity of the edge data.
Lightweight. Given the resource constraints in the edge computing environment, the computation and communication overhead of ES and AV should be as small as possible during the audit process.
Security. The scheme should prevent dishonest ESs from performing replay attacks, substitution attacks, and forgery attacks.

3.4. Homomorphic Hash

Homomorphic hash is a hash algorithm with homomorphic properties, and it has the following properties:

Homomorphism: For any two messages $m_{1}$ , $m_{2}$ and real numbers $w_{1}$ , $w_{2}$ , there is $H (w_{1} m_{1} + w_{2} m_{2}) = H {(m_{1})}^{w_{1}} H {(m_{2})}^{w_{2}}$ .
Collision resistance: there is no probability polynomial algorithm for the attacker to forge $(m_{1}, m_{2}, m_{3}, w_{1}, w_{2})$ and satisfy $m_{3} = w_{1} m_{1} + w_{2} m_{2}$ such that $H (m_{3}) =$ $H {(m_{1})}^{w_{1}} H {(m_{2})}^{w_{2}}$ .

Informally, a collision-resistant homomorphic hash function [30] can be defined as follows:

Given security parameters

λ_{p}

and

λ_{q}

, construct a set of hash parameters

G = (p, q, g)

, where

p

and

q

are two large random primes satisfying

|p| = λ_{p}

,

|q| = λ_{q}

and

q | (p - 1)

, and

g

is a

1 \times m

row vector consisting of random elements in

Z_{p}

of order

q

.

Let

β

be the block size and let

m = ⌈β / (λ_{q} - 1)⌉

. Consider a file

F

as an

m \times n

matrix, whose elements are all elements of

Z_{p}

. Our selection of m guarantees that each element is smaller than

2^{λ_{q} - 1}

and is thus less than the prime q. Now, the

j^{th}

column of

F

simply corresponds to the

j^{th}

block of the file

F

, which is represented as

b_{j} = (b_{1, j}, \dots, b_{m, j})

. Thus,

F = (b_{1}, b_{2}, \dots, b_{n}) = (\begin{matrix} b_{1, 1} & \dots & b_{1, n} \\ ⋮ & ⋱ & ⋮ \\ b_{m, 1} & \dots & b_{m, n} \end{matrix})

(1)

The addition of two blocks of data is the addition of the corresponding column vectors, e.g., the addition of the

i^{th}

block to the

j^{th}

block, which can be expressed as follows:

b_{i} + b_{j} = (b_{1, i} + b_{1, j}, b_{2, i} + b_{2, j}, \dots, b_{n, i} + b_{n, j}) \mod q

(2)

For an arbitrary message block

b_{j}

, define its hash as

H_{G} (b_{j}) = \prod_{t = 1}^{m} g_{t}^{b_{t, j}} \mod p

.

The homomorphism of the data block can be obtained from its hash computation process:

\begin{array}{l} H_{G} (b_{i} + b_{j}) = \prod_{t = 1}^{m} g_{t}^{b_{t, i} + b_{t, j}} \mod p = \prod_{t = 1}^{m} g_{t}^{b_{t, i}} g_{t}^{b_{t, j}} \mod p \\ = \prod_{t = 1}^{m} g_{t}^{b_{t, i}} \mod p \times \prod_{t = 1}^{m} g_{t}^{b_{t, j}} \mod p = H_{G} (b_{i}) \times H_{G} (b_{j}) \end{array}

(3)

Under standard cryptographic assumptions, the algorithm is secure, and the detailed proof is given in reference [30].

3.5. Reputation Models and Incentive Mechanism

The reputation model [28] and the incentive mechanism are a complementary organic whole, where the former is the basis for realizing incentives, and the latter provides a guarantee for enhancing the reputation and the overall trustworthiness of the system. The reputation model is the key to realizing behavioral incentives, and the user’s reputation is determined based on past behaviors. Game theory [29] is mainly used to examine the role of a node in the system under formulaic incentives to interact with each other, and it provides a mathematical theoretical approach to investigate the competition phenomenon, thereby mathematically portraying the behavior of users in strategic scenarios, where the success of each user’s choices relies on the behavioral choices of others. Game theory focuses on the behavior of specific entity perspectives in a given system and tries to study the optimization strategies involved. It views the inter-operation between entities as a game, where each game participant chooses a strategy and operates according to the rules defined in advance by the system designer, and it obtains a certain gain at the end of the game.

Reputation models based on game theory indicate that users’ behavioral choices not only depend on their individual judgments but also are closely related to the choices of other users who are online at the same time. Thus, the system can be modeled as a continuous game process in which users act rationally, i.e., they only consider maximizing their interests. Based on this assumption, the problem of reputation evaluation can be transformed into a strategy selection problem in game theory: if the model can generate a Nash equilibrium [31], the users are informed about the behavioral selection strategy that is most beneficial to them.

The incentive mechanism constructed based on the reputation model aims to improve the overall output of the system by guiding and constraining user behaviors. The incentives directly affect the users’ reputation values and their distribution. In many large-scale web applications, the possibility of keeping interaction history between users is usually low, so reputation-based incentives take the reputation value portrayed by the user’s historical behavior as the basis for incentives, provide differentiated services to users based on the reputation value to guide them to adopt the good behaviors expected by the system, and improve the overall output of the system by updating the user’s reputation evaluation. This approach is universal in practical applications and has strong engineering feasibility.

4. The Proposed Scheme EDI-C

4.1. Verification Algorithm

In the edge caching system, since both sides of data integrity verification own the original data, there is no need to pre-compute the data block labels as the basis for verification, so this paper designed a secure and efficient data integrity verification algorithm based on homomorphic hashing [30], which is described in detail as follows:

ParaGen $(λ_{p}, λ_{q}) \to (p p)$ : Given the security parameters $λ_{p}$ and $λ_{q}$ , the system parameters are generated $p p = {H_{G}, 𝒫, ℱ}$ . Specifically, $H_{G}$ is a homomorphic hash function with parameter $G = (p, q, g)$ , where $p$ and $q$ are two large random prime numbers satisfying $|p| = λ_{p}$ , $|q| = λ_{q}$ , and $q | (p - 1)$ , and $g$ is a $1 \times m$ row vector consisting of random elements in $Z_{p}$ of order $q$ ; $𝒫 : Z_{p}^{*} \times {1, \dots, n} \to {1, \dots, n}$ is a pseudo-random permutation (PRP) for determining the location of each randomly drawn data block; $ℱ : Z_{p}^{*} \times {1, \dots, n} \to Z_{p}^{*}$ is a pseudo-random function (PRF) for a random number generator.
Challenge $(p p) \to (c h a l)$ : The verifier first chooses $r$ data blocks for the challenge, then randomly generates the keys $K_{𝒫}$ and $K_{ℱ}$ for the PRP $𝒫$ and the PRF $ℱ$ , and finally constitutes the challenge $c h a l = {r, K_{𝒫}, K_{ℱ}}$ .
ProofGen $(c h a l, p p, F) \to (p r o o f)$ : The challenger first generates a set of challenge indexes $C = {c_{1}, c_{2}, \dots, c_{r}}$ $c_{i} = P_{K_{𝒫}} (i), 1 \leq i \leq r$ based on the key $K_{𝒫}$ and then generates the corresponding set of random numbers $S = {s_{1}, s_{2}, \dots, s_{r}}$ $s_{i} = F_{K_{ℱ}} (i), 1 \leq i \leq r$ based on the key $K_{ℱ}$ . Subsequently, it computes the hash value $H_{G} (b_{j}) = \prod_{t = 1}^{m} g_{t}^{b_{t, j}} \mod p,$ $j \in C$ of the challenged data block based on the set of challenge indexes $C$ and computes the integrity proof $p r o o f = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}}$ based on the set of random numbers $S$ . Finally, it returns $p r o o f$ to the verifier.
ProofVer $(p r o o f, p p, F) \to (T r u e, F a l s e)$ : The verifier checks whether the equation $H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}}) = p r o o f$ holds according to the locally saved data file $F$ and proves that the verification passes if the equation holds.

4.2. Incentive Mechanism

Theorem-type environments (including propositions, lemmas, corollaries, etc.) can be formatted as follows:

Following the game theory, this paper designed an incentive mechanism based on reputation value for the auditing scheme, which motivates the edge nodes involved in the verification to maintain honest behaviors from the perspective of interests while ensuring the virtuous cycle and security stability of the system. In the following, some relevant parameters are defined first:

Definition 1

(Credit). The Credit is the core of the incentive mechanism, and it equivalent to a credit token that passes through this verification model, and the essential source of Credit mainly comes from the initial credit of the ES and the payoff rewards obtained from participation in audits.

Definition 2

(IntialCredit). Each ES obtains a copy of the data from the AV by paying a deposit in exchange for an IntialCredit, which is confiscated when malicious behavior is detected on this ES.

Definition 3

(Reward). The Reward is the value that the AV determines the data to have and the compensation paid by the ES when the audit fails.

Definition 4

(DataValve). The DataValve is the value attributed to the data by the AV and the compensation paid by the ES when the audit fails.

Definition 5

(Penalty). The Penalty to be paid by the ES when malicious behavior is found in the audit.

The Credit is a combination of each ES’s risk solvency, economic strength, integrity, and historical behavior, and it can provide timely feedback about the ES’s situation to the AV and help the AV to manage the service resources stored on the ES. The verification incentive consists of the reward promised by the representative node that initiates the audit invitation and the penalty paid by the ES that fails to participate in the audit honestly and fails the integrity verification. When the audit is completed, ESs that provide correct audit findings in a timely manner share the rewards, while ESs that provide delayed or incorrect findings will not receive rewards. This mechanism encourages ESs to participate in edge data integrity verification quickly and honestly. The flowchart of the incentive mechanism is shown in Figure 3, where some of the key designs are described below:

Each ES entering the network needs to pay a certain number of deposits in exchange for IntialCredit, and the probability of an ES becoming a representative node in each round of auditing is positively related to its Credit. That is, the larger the Credit, the higher the probability of the node being elected as a representative node to undertake an audit.
The representative node that initiates the audit invitation needs to pay the corresponding audit Reward to the ES in the system involved in the verification of its data copy.
If the ES cooperates truthfully with the audit of its data corruption, only the DataValve corresponding to the corrupted data is required as compensation.
If the ES conceals or falsifies its data corruption, it not only pays out the corresponding DataValve but also distributes the penalty to other honest ESs.
The Credit is managed by the AV, and the credit settlement is completed automatically based on the audit results. If a round of auditing ends normally, each honest ES involved in this round will distribute all the audit payoffs of this round based on the contribution as a short-term incentive. The ES responsible for the audit will lose more Credit from collusion than it earns from collusion. Rational ESs will perform honest audits to maximize benefits and effectively resist collusion attacks during audits.
The ES’s performance in accomplishing the task will be recorded in the reputation ledger maintained by the AV to support the evaluation of its long-term Credit. Typically, ESs can only earn a high Credit by completing audit tasks with consistently high performance.

4.3. Basic Scheme

This paper considers an edge caching system consisting of

n

ESs, represented as

E S = {E S_{i} | 1 \leq i \leq n}

. The AV has one or more copies of data cached on each ES. All the ESs in the system are divided into multiple regions

E S_{G} \in E S

according to their geographic distribution. Assume that there are

s

ESs in each region, and each ES in the

E S_{G}

and AV maintains a list

E S_{G} = {E S_{i} | 1 \leq i \leq s}

. Let both the original file stored on the AV and its copy of the same data cached in the ESs be

F

. Assume that a file

F

is divided into

l

data blocks, i.e.,

F = (b_{1}, b_{2}, \dots, b_{l})

, where each

b_{i}

is

l

bits and

b_{i} \in G F (2^{l}),

i \in {1, 2, \dots, l}

. If the length of the last data block is less than

l

bits, it is made up of an identifier.

This paper first considers the case when only one copy of the original file

F

needs to be checked on each ES. The workflow of the scheme is illustrated in Figure 4, which consists of three phases: the Setup Phase, the Data Integrity Audit Phase, and the Result Summary Phase. In the Setup Phase, the AV distributes file data and scheme-related system parameters to all ESs in the system for preservation; in the Data Integrity Audit Phase, the AV firstly selects representative nodes in each region based on the IntialCredit, and then the nodes in each region perform a collaborative audit of cached data integrity and upload the results to the AV; in the Result Summary Phase, the AV aggregates the final audit results based on all the verification results and performs reputation settlement based on the incentive mechanism.

4.3.1. Setup Phase

The AV generates system parameters

p p = {H_{G}, 𝒫, ℱ}

according to ParaGen in the verification algorithm introduced in Section 4.1. For a given file

F = (b_{1}, b_{2}, \dots, b_{l})

, there is

I_{F} \in {0, 1}^{*}

. Eventually, the AV sends the cached data file

F

along with the system parameters

p p

to the ESs in the system.

4.3.2. Data Integrity Audit Phase

Step 1 Selection of representative nodes

First, each ES node in the network pays a certain number of integrity deposits in exchange for IntialCredit. The AV selects the best reputable server from each group of

E S_{G}

as the audit representative node

E S_{d}

for the data replica

F

according to the reputation of the ESs retrieved from the Credit Ledger. Based on the idea of the reputation model, a high Credit value indicates that this ES has performed edge data integrity auditing honestly and timely, and if this ES has been performing well in the past, it is unlikely that it has committed malicious behavior as an audit representative.

Step 2 Audit invitation

To check the integrity of replica data in all the ESs in the shortest possible time,

E S_{d}

makes a request for auditing the local data to the ESs in the area while verifying each other’s data.

To verify the replica

F

in the

E S_{k} \in E S_{G}

in the area,

E S_{d}

constructs a challenge request

r e q_{d \to k} = < I D_{d}, I_{F}, r, K_{𝒫}^{k}, K_{ℱ}^{k} >

to each

E S_{k}

according to the Challenge algorithm introduced in Section 4.1, where

I D_{d}

represents the identity of

E S_{d}

,

I_{F}

represents the unique identifier of

F

,

r

represents the number of data blocks for the challenge,

K_{𝒫}^{k}

represents a randomly generated key for PRP

𝒫

, and

K_{ℱ}^{k}

represents a randomly generated key for PRF

ℱ

.

Finally,

E S_{d}

sends an audit invitation

i n v_{d \to k} = {r e q_{d \to k}, r e w a r d}

to

E S_{k}

, where Reward represents the reward promised by

E S_{d}

for this audit.

Step 3 Response Audit

Upon receiving the audit invitation

i n v_{d \to k} = {r e q_{d \to k}, r e w a r d}

, the

E S_{k}, k \in {1, 2, \dots, s}

first checks whether it has a data replica

F

locally. If it has,

E S_{k}

accepts the audit invitation and continues with the following steps; otherwise, it does not participate in this round of audit.

E S_{k}

generates the set of challenge indexes

C_{d \to k} = {c_{1}, c_{2}, \dots, c_{r}}

c_{i} = P_{K_{𝒫}} (i), 1 \leq i \leq r

and the corresponding set of random numbers

S_{d \to k} = {s_{1}, s_{2}, \dots, s_{r}}

s_{i} = F_{K_{ℱ}} (i), 1 \leq i \leq r

based on the keys

K_{𝒫}^{k}

and

K_{ℱ}^{k}

, then computes the hash

H_{G} (b_{j}) = \sum_{t = 1}^{m} g_{t}^{b_{t, j}} \mod p, j \in C_{d \to k}

and the integrity proof

E_{k} = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}}

, and eventually constructs the integrity proof

p r o o f_{k \to d} = < I D_{k}, I_{F}, E_{k} >

of the local replica

F

.

Meanwhile,

E S_{k}

also acts as a verifier to make a challenge request

r e q_{k \to d} = < I D_{k}, I_{F}, r, K_{𝒫}^{d}, K_{ℱ}^{d} >

to

E S_{r}

, where

I D_{k}

denotes the identity of

E S_{k}

,

I_{F}

denotes the unique identifier of

F

,

r

denotes the number of data blocks for the challenge,

K_{𝒫}^{d}

denotes a randomly generated key for PRF

𝒫

, and

K_{ℱ}^{d}

denotes a randomly generated key for the PRF

ℱ

.

Finally, the

E S_{k}

combines the

p r o o f_{k \to d} = < I D_{k}, I_{F}, E_{k} >

and

r e q_{k \to d} = < I D_{k}, I_{F}, r, K_{𝒫}^{d},

K_{ℱ}^{d} >

into a response

r e s_{k \to d} = {p r o o f_{k \to d}, r e q_{k \to d}}

returned to the representative node

E S_{d}

.

Step 4 Verification

After

E S_{d}

receives the response

r e s_{k \to d} = {p r o o f_{k \to d}, r e q_{k \to d}}

from

E S_{k}

, it first checks whether

E S_{k}

is the legal data owner in

E S_{G}

. If it is,

E S_{d}

proceeds with the following steps; otherwise, this message is not received.

E S_{d}

generates the set of challenge indexes

C_{k \to d} = {c_{1}, c_{2}, \dots, c_{r}}

c_{i} = 𝒫_{K_{𝒫}^{d}} (i), 1 \leq i \leq r

and the corresponding set of random numbers

S_{k \to d} = {s_{1}, s_{2}, \dots, s_{r}}

s_{i} = ℱ_{K_{ℱ}^{d}} (i), 1 \leq i \leq r

based on the keys

K_{𝒫}^{d}

and

K_{ℱ}^{d}

, then computes the hash

H_{G} (b_{j}) = \prod_{t = 1}^{m} g_{t}^{b_{t, j}} \mod p

,

j \in C_{k \to d}

and the integrity proof

E_{d} = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}}

of the challenged data block, and eventually constructs the integrity proofs

p r o o f_{d \to k} = < I D_{r}, I_{F}, E_{d} >

of the local replica

F

to be sent to

E S_{k}

.

Meanwhile,

E S_{d}

checks whether the equation

H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}}) = E_{k}

holds based on its own locally stored data copy. If the equation holds, it indicates that the verification passes, i.e., the copy stored on the

E S_{k}

is complete; otherwise, the verification fails, i.e., the copy on the

E S_{k}

is corrupted. Finally,

E S_{d}

reports all the final verification results and the

r e w a r d

promised by this audit to the AV.

After

E S_{k}

receives the data integrity proof returned by

E S_{d}

, it checks whether the equation

H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}}) = E_{d}

holds based on its own locally stored data copy. If the equation holds, it indicates that the verification passes; otherwise, the verification fails. Finally,

E S_{k}

reports the verification result to the AV.

4.3.3. Result Summary Phase

In unreliable or error-prone processor networks, there is an underlying assumption that no more than half of the participants will fail simultaneously. In an edge caching system, n data replicas cached on n ESs are likely to be corrupted simultaneously by only a few of them, i.e., at least (n + 1)/2 data replicas are correct at any given time. This is reasonable in highly distributed mobile EC environments. This is because most edge data copies are unlikely to be corrupted by concurrent hardware failures, software anomalies, or network attacks.

When [s/2] verification results about the data copy

F

in

E S_{d}

are obtained that are identical and honest, the AV assumes that

E S_{d}

has a valid data copy

F

and accepts the verification results uploaded by

E S_{d}

about the data copies on the other ESs in the group, and it uses them to update VerResults and EDIList in the Audit Ledger. Then, the AV performs credit settlement based on the incentive mechanism as follows:

For ESs that have corrupted data but truthfully participate in the audit, the AV deducts the DataValue corresponding to their corrupted data.
For ESs that conceal or falsify data corruption, the AV deducts the DataValue corresponding to their corrupted data and Penalty.
Each honest ES participating in this round of auditing distributes all Reward and Penalty compensated by dishonest ESs for this round based on their contributions. The honest representative node $E S_{d}$ is paid more than the normal ESs in the group involved in auditing. The Credit lost by the ESs responsible for auditing due to collusion will be larger than the Credit earned by collusion, so rational ESs will perform honest auditing to maximize the benefit.
The AV updates the SettlementRecord and CreditList in the Credit Ledger.

4.4. Scheme Expansion

4.4.1. Batch Audit

In practice, data from multiple files may be stored in the ES, and the basic auditing scheme can verify only one piece of file data in each round of interaction, which is inefficient and leads to much resource waste. In this section, the basic auditing scheme is extended with improvements based on homomorphism to enable batch auditing of the integrity of multiple copies of file data on ESs. In the process of collaborative auditing, the ES acts as both the verifier and the verified, and batch auditing is changed only when constructing challenge requests, generating proofs, and verifying, which is described in detail below:

Assuming that one wants to audit

m

different files

F^{t} (t = 1, \dots, m)

simultaneously, the

E S_{v}

on the side of the verifier constructs the challenge request

r e q =

< I D_{v}, {I_{F^{t}}}_{t \in [1, m]}, r, K_{𝒫}, K_{ℱ} >

, where

I D_{v}

denotes the identity of the ES that sends the challenge request,

{I_{F^{t}}}_{t \in [1, m]}

denotes the set of identifiers of all the challenged files

F^{t} (t = 1, \dots, m)

,

r

denotes the number of challenged data chunks,

K_{𝒫}

denotes a randomly generated key for PRP

𝒫

, and

K_{ℱ}

denotes a randomly generated key for PRF

ℱ

.

Upon receiving the challenge request

r e q

, the verified

E S_{p}

first generates a set of challenge indexes

C = {c_{1}, c_{2}, \dots, c_{r}}

c_{i} = P_{K_{𝒫}} (i), 1 \leq i \leq r

and a set of random numbers

S = {s_{1}, s_{2}, \dots, s_{r}}

s_{i} = F_{K_{ℱ}} (i), 1 \leq i \leq r

based on the keys

K_{𝒫}

and

K_{ℱ}

. Then, it computes

E = \prod_{t = 1}^{m} \prod_{i = 1}^{r} H_{G} {(b_{c_{i}}^{t})}^{s_{i}}

, where

b_{c_{i}}^{t}

represents the

c_{i}

^th data block in the file

F^{t}

that is challenged. Finally, the integrity proof

p r o o f = < I D_{p}, {I_{F^{t}}}_{t \in [1, m]}, E >

of all challenged files is constructed, where

I D_{p}

represents the identity of the verified

E S_{p}

, and

{I_{F^{t}}}_{t \in [1, m]}

represents the set of identifiers of all challenged files

F^{t} (t = 1, \dots, m)

.

The

E S_{v}

as the verifier receives the

p r o o f

and checks whether the equation

H_{G} (\sum_{t = 1}^{m} \sum_{i = 1}^{r} s_{i} b_{c_{i}}^{t}) = E

holds based on its locally stored data copies. If the equation holds, it indicates that the verification passes, i.e., all the

m

copies of the file

F^{t} (t = 1, \dots, m)

stored on the

E S_{p}

are intact; otherwise, the verification fails, i.e., at least one of the copies of the file on the

E S_{p}

is corrupted.

4.4.2. Localization and Recover

When a round of edge data integrity auditing is completed, the AV recognizes that

E S_{d}

has a valid replica

F

and assigns

E S_{d}

to obtain the data corruption location and recover the data for the ESs in the domain that fail the audit.

According to the audit result reported by the AV, the edge node

E S_{f}

that fails the audit sends its identity information and the hash value of all data blocks as proof of data integrity

P r o o f_{f \to d} = < I D_{f}, I_{F}, {H_{G} (b_{i})}_{i \in [1, l]} >

to the representative node

E S_{d}

that has been recognized by the AV.

E S_{d}

employs the bisection method to quickly search and obtain the location of the corrupted data block in the following steps:

When $E S_{d}$ receives $P r o o f_{f \to d}$ , it first divides ${H_{G} (b_{i})}_{i \in [1, l]}$ into $E_{l e f t} = \prod_{i = 1}^{[l / 2]} H_{G} {(b_{i})}^{s_{i}}$ and $E_{r i g h t} = \prod_{i = [l / 2]}^{l} H_{G} {(b_{i})}^{s_{i}}$ parts.
$E S_{d}$ checks if $H_{G} (\sum_{i = 1}^{[l / 2]} b_{i}) = E_{l e f t}$ holds based on locally stored data $F$ . If the verification passes, then the process ends; otherwise, the process continues to the next step. The other part $E_{r i g h t}$ is verification similarly.
Iterate over the first two steps until all damaged data blocks are found.

Finally,

E S_{d}

sends the corrupted data block to the edge node

E S_{f}

that fails the audit to recover its stored file data.

5. Security Analyses

Theorem 1

(Correctness). If the ES stores a complete and correct copy of the replica data

F

, it can generate proof of data integrity that can successfully pass verification.

Proof of Theorem 1.

According to the homomorphism of homomorphic hash functions,

H (w_{1} m_{1} + w_{2} m_{2}) = H {(m_{1})}^{w_{1}} H {(m_{2})}^{w_{2}}

holds for any two messages

m_{1}, m_{2}

and real numbers

w_{1}, w_{2}

. The equation must hold if the proof provided by the ES can be verified for data integrity. If the

p r o o f = < I D_{k}, I_{F}, E_{k} >

provided by the

E S_{k}

passes the verification of data integrity, then the equation

H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}}) = E_{k}

must hold, and its correctness is shown below:

\begin{array}{l} H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}}) = \prod_{t = 1}^{m} g_{t}^{\sum_{i = 1}^{r} s_{i} b_{t, c_{i}}} \mod p = \prod_{i = 1}^{r} \prod_{t = 1}^{m} g_{t}^{s_{i} b_{t, c_{i}}} \mod p \\ = \prod_{i = 1}^{r} (\prod_{t = 1}^{m} g_{t}^{b_{t, c_{i}}})^{s_{i}} \mod p = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}} = E_{k} \end{array}

(4)

□

Theorem 2

(Probabilistic verification). Given a replica

F = (b_{1}, b_{2}, \dots, b_{l})

with

l

data blocks, EDI-C can successfully detect a corrupted data replica with a probability of at least

1 - {(\frac{l - d_{k}}{l})}^{r}

, where

d_{k}

denotes the number of corrupted data blocks of replica

F

in the edge node, and

r

denotes the number of challenge blocks.

Proof of Theorem 2.

The pseudo-random permutation P is used in EDI-C to determine the location of each randomly selected data block, and the probability that the verifier successfully detects a corrupted replica

F

on an edge node is equal to the probability that at least one of the corrupted data blocks in the replica

F

will be challenged. According to the rules of probability computation, this probability expressed as follows:

P (η \geq 1) = 1 - P (η = 0) = 1 - (\frac{l - d_{k}}{l}) (\frac{l - d_{k} - 1}{l - 1}) (\frac{l - d_{k} - 2}{l - 2}) \dots (\frac{l - d_{k} - r + 1}{l - r + 1})

where

η

denotes the number of detected damaged blocks.

Given any integer

i \leq l

, there is

(\frac{l - d_{k} - i}{l - i}) \geq (\frac{l - d_{k} - i - 1}{l - i - 1})

, and thus the inequality

P (η \geq 1) > 1 - {(\frac{l - d_{k}}{l})}^{r}

holds.

Additionally, instead of verifying all data blocks of each replica in the auditing process, only some randomly selected data blocks are needed for integrity verification, making it more efficient and economical. □

Theorem 3

(Resistance to Forgery Attacks). In EDI-C, ESs cannot be verified using forged data integrity proofs.

Proof of Theorem 3.

In each proof generation and verification phase of EDI-C, the ESs need to recalculate the corresponding integrity proofs based on the respective stored data copies and new challenge requests. Suppose that

E S_{𝒜}

has a corrupted data block. Since each ES generates different challenge indexes and random numbers based on the received keys

K_{P}

and

K_{F}

,

E S_{𝒜}

cannot forge its own proof by intercepting the integrity proofs of other ESs. Specifically, assuming that the data copy

F

is corrupted by the challenged data block

b_{i}

and that the

p r o o f

computed by

E S_{𝒜}

using the forged data block

b_{i}^{*}

can pass the verification, then

H_{G} (b_{i}) = H_{G} (b_{i}^{*})

must hold. This implies that

E S_{𝒜}

can discover a collision of the hash function

H_{G}

, which contradicts the assumption that

H_{G}

is a collision-resistant hash function. Therefore, no ES can use a forged data block to generate an integrity proof to pass the verification. That is, EDI-C can resist forgery attacks. □

Theorem 4

(Resistance to Replay Attacks). In EDI-C, no ES can replay the previous audit’s correct proof of integrity (i.e., a correct but outdated proof of integrity) to pass a new round of integrity verification by other ESs.

Proof of Theorem 4.

Define the replay attack game as follows: assume

E S_{𝒜}

is a malicious adversary. Upon receiving a challenge

{I D_{k}, I_{F}, C_{𝒜}, S_{𝒜}}

sent by another

E S_{k}

,

E S_{𝒜}

returns an expired

P r o o f_{𝒜}' = < I D_{𝒜}, I_{F}, E_{𝒜}' >

, and since

C_{𝒜}'

and

S_{𝒜}'

are both expired,

E_{𝒜}'

is also expired. The expired proof was previously generated by

E S_{𝒜}

or intercepted from other ESs. If

P r o o f_{𝒜}'

can pass the integrity verification, then

E S_{𝒜}

wins.

Let the set of outdated challenge indexes and the set of random numbers be

C_{𝒜}'

and

S_{𝒜}'

, respectively, and let the set of new challenge indexes and the set of random numbers be

C_{𝒜}

and

S_{𝒜}

, respectively. Assuming that

P r o o f_{𝒜}' = < I D_{𝒜}, I_{F}, E_{𝒜}' >

can be verified, where

E_{𝒜}'

is computed by

E S_{𝒜}

using the correct but outdated

C_{𝒜}'

and

S_{𝒜}'

, then

E_{𝒜}' = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}} = H_{G} (\sum_{i = 1}^{r} s_{i} b_{c_{i}})

must hold, which indicates that

E S_{𝒜}

can find collisions of the hash function

H_{G}

. However, this is contradicted by the collision-resistant nature of the hash function. Additionally, the seeds of the PRP for each round of audit challenges in the sampling algorithm differ, which leads to different challenge blocks and the corresponding random numbers for each round of auditing, i.e., there is a high probability that

C_{𝒜}' \neq C_{𝒜}

and

S_{𝒜}' \neq S_{𝒜}

, so

E_{𝒜}' \neq E_{𝒜}

. Therefore,

P r o o f_{𝒜}'

cannot pass the verification with a significant advantage.

Thus, ES wins with negligible probability and no ES can replay the correct but outdated integrity proof to pass a new round of integrity verification by other ESs. That is, EDI-C can effectively resist replay attacks. □

Theorem 5

(Resistance to Replace Attacks). In EDI-C, no ESs can generate an integrity proof by replacing the challenged data block with other complete data blocks to pass the verification of the other ESs.

Proof of Theorem 5.

The game defining the replace attack is introduced as follows: Assume that

E S_{𝒜}

is a malicious adversary (here an ES can be either a representative node in the group or any other ordinary node), and when it receives a challenge

{I D_{k}, I_{F}, C_{𝒜}, S_{𝒜}}

sent by another

E S_{k}

,

E S_{𝒜}

returns

p r o o f_{𝒜}' =

< I D_{𝒜}, I_{F}, E_{𝒜}' >

to the ES that initiated the challenge. There are two scenarios of the replace attack: (1)

E S_{𝒜}

generates integrity proof

p r o o f_{𝒜}'

by replacing the challenged data block

b_{i}

with another unchallenged valid data block

f_{j} (j \neq i)

; (2)

E S_{𝒜}

takes the

p r o o f_{j}

of another

E S_{j} (j \neq 𝒜)

as

p r o o f_{𝒜}'

. If

p r o o f_{𝒜}'

can pass the integrity verification, indicates that

E S_{𝒜}

wins.

Case 1:

E S_{𝒜}

replaces

b_{i}

with another complete data block

b_{i} (i \neq j)

to compute

E_{𝒜}'

as an integrity proof. Assume that the correct integrity proof corresponding to the challenge index set

C_{𝒜} = {c_{1}, c_{2}, \dots, c_{r}}

is

E_{𝒜} = \prod_{i = 1}^{r} H_{G} {(b_{c_{i}})}^{s_{i}}

. Since

b_{i} = b_{j}

holds with negligible probability, if the integrity proof obtained in this approach can pass the integrity verification of other ESs by a non-negligible margin, then

E_{𝒜} = E_{𝒜}'

must hold. This indicates that

E S_{𝒜}

can find collisions of the hash function

H_{G}

, which contradicts the assumption that

H_{G}

is a collision-resistant hash function. Therefore,

E S_{𝒜}

cannot generate a valid integrity proof by replacing the challenged data block with another complete data block.

Case 2: The challenge index set

C = {c_{1}, c_{2}, \dots, c_{r}}

consists of

r

different indexes generated by the PRP based on the key

K_{P}

. Since the key

K_{P}

in each challenge is randomly generated, the key

K_{P}

used by each ES to construct the challenge index set differs, so different ESs obtain different challenge index sets

C

. Even though two different ESs obtain the same challenge index set

C

with a low probability, the probability that the sets

S = {s_{1}, s_{2}, \dots, s_{r}}

of random numbers corresponding to the challenge index set

C

generated by different

E S_{k}

are identical is almost negligible. Therefore, the

E S_{𝒜}

cannot pass the verification by replacing its integrity proof

P r o o f_{𝒜}

with the integrity proof

P r o o f_{j}

of the

E S_{j} (j \neq 𝒜)

.

Thus,

E S_{𝒜}

wins the game with negligible probability, and any ES cannot replace the challenged data block with other complete data blocks to generate an integrity proof to pass the verification of other ESs. That is, EDI-C can resist the replace attack. □

6. Performance Evaluation

6.1. Functionality Comparison

To better demonstrate the superiority of EDI-C, it is compared with the existing schemes on edge data integrity verification mainly in terms of realization function and security, and the results are listed in Table 1. In EDI-V, the AV can only verify the integrity proofs returned by the ES one by one, and the ES needs to regenerate a new Merkle hash tree for the integrity proof in each audit, which results in a high verification overhead and does not support aggregation verification. In contrast, EDI-S, EDI-SA, EDI-DA, and ICL-EDI implement aggregated verification of the integrity proofs returned by the ESs based on elliptic curve signatures, algebraic signatures, and homomorphic labels, respectively. Among them, EDI-SA and EDI-DA also support batch auditing. However, ICL-EDI is not secure and cannot resist replay, replace, and forgery attacks. CooperEDI and EdgeWatch are distributed edge data verification protocols implemented based on blockchain. They are not limited to data integrity verification by the AV but perform collaborative verification among ESs, so they are more suitable for EC environments with large-scale distributed deployments, but the integrity proofs constructed from the hash value of the entire data file cannot resist replay, replace, and forgery attacks. EDI-C combines improved sampling algorithms and homomorphic hashing algorithms to locate and recover from data block-level corruption while resisting forgery, replay, and replace attacks. It is more functional and secure and is more suitable for EC environments.

6.2. Experimental Evaluation

To evaluate the performance of EDI-C, it is compared with typical EDI-V schemes [4] and ICL-EDI schemes [7]. In this research, only the performance of data integrity verification was compared, and EDI-V, ICL-EDI, and EDI-C were implemented in simulation experiments to compare their computation and communication overheads. In order to evaluate the performance of EDI-C more comprehensively, the three parameters system scale, data block size, and sampling scale were varied in this study to simulate different edge data integrity verification scenarios. Each time, the value of one parameter was changed, while the other parameters were kept at their default values, as shown in Table 2. Each experiment was repeated 100 times, and the average result was finally reported.

All experiments were conducted on a computer running the Ubuntu 18.04.6 system and equipped with R9-5900HX CPU (4.6 GHz) and 32 GB RAM, and the program was implemented using Python 3.6.9. In order to simulate the edge cache system, a total of 10 different geographic regions were specified in the whole system, and 10 virtual machines were randomly mapped to a geographic region to simulate the ESs in the region. In addition, another virtual machine was used to simulate the AV. Since the network latency fluctuated over time and was roughly 50ms on average, the network latency between virtual machines was ignored. The hash algorithm in the program was implemented using the SHA256 function in the Cryptodome. Hash library, while the random number generator algorithm was implemented using the RandomState function.

6.2.1. Computation Overhead

The computational overhead is measured by the computational time required to complete the entire verification process, and the smaller the computational overhead, the higher the performance of the scheme. According to the current RSA security requirements, the public key is required to be at least 4096 bits.

Figure 5 shows the time consumed by each scheme when the system scale (the total number of ESs participating in the audit) increased from 20 to 100. It can be seen from Figure 5 that the time consumed to complete the audit of EDI-V and ICL-EDI increased with the system scale, while the computational overhead of EDI-C remained stable. For example, when the system scale was 20, the consumed times of EDI-V, ICL-EDI, and EDI-C were 1408 s, 2289 s and 278 s, respectively. When the system scale increased to 100, the consumed times of EDI-V, ICL-EDI, and EDI-C were 1808 s, 2799 s, and 289 s, respectively. Since EDI-V and ICL-EDI are centralized audit schemes, the integrity verification of all edge data copies was completed by AV, which needed to find the same sampled data from the original data for verification as that in ES. EDI-C is a collaborative audit scheme, and all ESs in the system engage in mutual interaction and simultaneous auditing. EDI-C is less affected by the change in system scale and has better scalability.

Figure 6 compares the time consumed by each scheme when the data block size increased from 128 KB to 2 MB. In EDI-V, each ES creates a subtree of VMHT based on the sampled data blocks, and the increase in the data block size affects the total number of nodes in the VMHT on each ES, so the time spent by the ESs increases. In both ICL-EDI and EDI-C, the computation overhead of constructing proofs and verifications is related to the data block size. However, in ICL-EDI, additional overhead is taken to construct homomorphic labels. When the data block size increased from 128 KB to 2048 KB, the time consumption of EDI-V increased from 264.96 s to 4187.74 s, while that of ICL-EDI increased from 148.57 s to 2438.77 s. In contrast, the time consumption of EDI-C increased only from 97.58 s to 1089.14 s, demonstrating that EDI-C can effectively handle large-scale data copies.

Figure 7 compares the time consumption of each scheme when the sampling scale increased from 50 to 250. As shown in Figure 7, the time required to complete the audit by ICL-EDI, EDI-V, and EDI-C increased significantly with the sampling scale. ICL-EDI and EDI-C need to generate the corresponding proof of data integrity based on the sampled data blocks. However, in ICL-EDI, AV also needs more time to check the proof when the sampling scale increases. For EDI-V, when the sampling scale increases, each ES needs to spend more time to generate the VMHT. The EDI-C scheme has the least time consumption because it does not need to generate complex data block labels and does not need to generate VMHT as proof; instead, it only needs to calculate the hash of the sampled data blocks.

Then, the computational complexity of these three schemes was theoretically analyzed and compared, and the results are presented in Table 3. In this table, l denotes the number of data blocks divided by each data replica,

s

denotes the number of ESs, and

r

denotes the number of challenged blocks on each ES. To compare the computational complexity with the centralized verification schemes (ICL-EDI and EDI-V), the following analysis is mainly performed in terms of the ParaGen, ProofGen, and ProofVer phases. In the ParaGen phase, ICL-EDI needs to calculate the label for each data block, and the larger the size, the more data block labels need to be calculated. The AV in EDI-V also needs to build the VMHT with all data blocks. However, EDI-C does not require complicated preparation in the ParaGen phase, and its computational complexity is independent of l. In the ProofGen phase, the computational complexity of EDI-V is related to the number of data blocks l. Each ES generates subtrees of VMHT as integrity proof in each audit round. A larger l leads to a larger depth of the VMHT, which significantly increases the computational complexity of AV and ES. However, the computational complexity of EDI-C and ICL-EDI in the ProofGen phase is only related to the number of challenge blocks r, which is much smaller than the number of data blocks l, so their computational complexity is smaller. In the ProofVer phase, the computational complexity of both EDI-V and ICL-EDI is related to the number of challenge blocks r and the number of edge servers s. Moreover, the computational complexity of EDI-V is exponential to the number of challenge blocks r because it needs to construct subtrees of the corresponding VMHT for verification. In contrast, the computational complexity of EDI-C is only related to the number of challenge blocks r, so it is smaller at verification time.

To sum up, the experimental results indicate that the scheme proposed in this paper has less computational overhead than other related works.

6.2.2. Communication Overhead

Figure 8 shows that the communication overhead of the three schemes as the system scale (the total number of ESs participating in the audit) increased from 20 to 100. For centralized verification schemes such as EDI-V and ICL-EDI, the communication overhead is mainly related to the examination of the integrity proof size (KB) of the edge data copy returned by the ES to the AV. For the cooperative auditing scheme EDI-C, the communication overhead is mainly related to the examination of the total amount of integrity proof size (KB) that needs to be passed between ES for interactive auditing. Overall, EDI-C has a higher communication overhead than the other schemes. This is because EDI-C requires multiple ESs to collaborate in performing integrity verification, and as the system scale increases, more ESs are involved, which leads to higher communication overhead. However, EDI-C verifies replica data integrity by the collaboration between ESs; the AV is not involved in data integrity verification; and the communication overhead for verification is mainly composed of the overall communication overhead incurred between different ESs, which occurs only at the edge of the network, thereby ensuring its application in real EC systems. However, the communication overhead of centralized verification edge data integrity verification schemes like EDI-V and ICL-EDI is incurred on the backhaul network between the ESs and the AV, and this is contrary to the goal of minimizing backhaul network traffic in mobile EC environments.

7. Conclusions

Due to the discrete and unstable characteristics of the MEC environment, ensuring the consistency of multiple replicas on different edge servers is the key to providing secure and reliable edge caching services. This paper proposes an efficient and lightweight multi-replica integrity verification algorithm based on homomorphic hash and sampling algorithms, which effectively reduces the storage cost and computational cost and can resist forgery, replay, and replace attacks. Then, based on the verification algorithm, a multi-replica edge data integrity cooperative audit scheme EDI-C is further proposed based on the reputation model, which coordinates the positive and honest participation of ESs in auditing through an incentive mechanism in a distributed discrete environment. By using parallel processing and data block audit technology, EDI-C can also support batch audit of multiple cached copies of original data, which not only greatly improves the verification efficiency but also realizes the accurate location and repair of corrupted data at the data block level. Through security analysis, EDI-C is proven to be resistant to forgery, replay, and replace attacks. The comparison of EDI-C with two representative schemes through simulation experiments indicates that EDI-C has significantly less computation and communication overheads.

Author Contributions

The authors confirm contribution to the paper as follows: conceptualization idea and methodology: F.Y., Y.S. and X.C.; data curation: Q.G.; formal analysis and investigation of results: F.Y.; writing—original draft preparation: F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We express our gratitude to the relevant personnel from Information Engineering University who give us large help for our scheme.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, B.; He, Q.; Cui, G.; Xia, X.; Chen, F.; Jin, H.; Yang, Y. READ: Robustness-Oriented Edge Application Deployment in Edge Computing Environment. IEEE Trans. Serv. Comput. 2022, 15, 1746–1759. [Google Scholar] [CrossRef]
Shi, W.; Jie, C.; Quan, Z.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Yi, S.; Qin, Z.; Li, Q. Security and Privacy Issues of Fog Computing: A Survey. In Wireless Algorithms, Systems, and Applications; Xu, K., Zhu, H., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9204, pp. 685–695. [Google Scholar]
Li, B.; He, Q.; Chen, F.; Jin, H.; Xiang, Y.; Yang, Y. Auditing Cache Data Integrity in the Edge Computing Environment. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1210–1223. [Google Scholar] [CrossRef]
Li, B.; He, Q.; Chen, F.; Jin, H.; Xiang, Y.; Yang, Y. Inspecting Edge Data Integrity with Aggregated Signature in Distributed Edge Computing Environment. IEEE Trans. Cloud Comput. 2021, 10, 2691–2703. [Google Scholar] [CrossRef]
Qiao, L.; Li, Y.; Wang, F.; Yang, B. Lightweight Integrity Auditing of Edge Data for Distributed Edge Computing Scenarios. Ad Hoc Netw. 2022, 133, 102906. [Google Scholar] [CrossRef]
Cui, G.; He, Q.; Li, B.; Xia, X.; Chen, F.; Jin, H.; Xiang, Y.; Yang, Y. Efficient Verification of Edge Data Integrity in Edge Computing Environment. IEEE Trans. Serv. Comput. 2022, 15, 3233–3244. [Google Scholar] [CrossRef]
Ding, Y.; Li, Y.; Yang, W.; Zhang, K. Edge Data Integrity Verification Scheme Supporting Data Dynamics and Batch Auditing. J. Syst. Archit. 2022, 128, 102560. [Google Scholar] [CrossRef]
Sanaei, Z.; Abolfazli, S.; Gani, A.; Buyya, R. Heterogeneity in Mobile Cloud Computing: Taxonomy and Open Challenges. IEEE Commun. Surv. Tutor. 2014, 16, 369–392. [Google Scholar] [CrossRef]
Ateniese, G.; Burns, R.; Curtmola, R.; Herring, J.; Kissner, L.; Peterson, Z.; Song, D. Provable Data Possession at Untrusted Stores. In Proceedings of the 14th ACM Conference on Computer and Communications Security—CCS ’07, Alexandria, VA, USA, 31 October–2 November 2007; ACM Press: New York, NY, USA, 2007; p. 598. [Google Scholar]
Juels, A.; Kaliski, B.S. Pors: Proofs of Retrievability for Large Files. In Proceedings of the 14th ACM Conference on Computer and Communications Security—CCS ’07, Alexandria, VA, USA, 31 October–2 November 2007; ACM Press: New York, NY, USA, 2007; p. 584. [Google Scholar]
Curtmola, R.; Khan, O.; Burns, R.; Ateniese, G. MR-PDP: Multiple-Replica Provable Data Possession. In Proceedings of the 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 411–420. [Google Scholar]
Liu, C.; Ranjan, R.; Yang, C.; Zhang, X.; Wang, L.; Chen, J. MuR-DPA: Top-Down Levelled Multi-Replica Merkle Hash Tree Based Secure Public Auditing for Dynamic Big Data Storage on Cloud. IEEE Trans. Comput. 2015, 64, 2609–2622. [Google Scholar] [CrossRef]
Guo, W.; Qin, S.; Gao, F.; Zhang, H.; Li, W.; Jin, Z.; Wen, Q. Dynamic Proof of Data Possession and Replication with Tree Sharing and Batch Verification in the Cloud. IEEE Trans. Serv. Comput. 2020, 15, 1813–1824. [Google Scholar] [CrossRef]
Barsoum, A.F.; Hasan, M.A. Provable Multicopy Dynamic Data Possession in Cloud Computing Systems. IEEE Trans. Inform. Forensic Secur. 2015, 10, 485–497. [Google Scholar] [CrossRef]
Zhang, J.; Li, T.; Jiang, Q.; Ma, J. Enabling Efficient Traceable and Revocable Time-Based Data Sharing in Smart City. EURASIP J. Wirel. Commun. Netw. 2022, 2022, 3. [Google Scholar] [CrossRef] [PubMed]
Yang, C.; Song, B.; Ding, Y.; Ou, J.; Fan, C. Efficient Data Integrity Auditing Supporting Provable Data Update for Secure Cloud Storage. Wirel. Commun. Mob. Comput. 2022, 2022, 5721917. [Google Scholar] [CrossRef]
Li, J.; Yan, H.; Zhang, Y. Efficient Identity-Based Provable Multi-Copy Data Possession in Multi-Cloud Storage. IEEE Trans. Cloud Comput. 2022, 10, 356–365. [Google Scholar] [CrossRef]
Peng, S.; Zhou, F.; Li, J.; Wang, Q.; Xu, Z. Efficient, Dynamic and Identity-Based Remote Data Integrity Checking for Multiple Replicas. J. Netw. Comput. Appl. 2019, 134, 72–88. [Google Scholar] [CrossRef]
Yin, H.; Zhang, X.; Liu, H.H.; Luo, Y.; Tian, C.; Zhao, S.; Li, F. Edge Provisioning with Flexible Server Placement. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 1031–1045. [Google Scholar] [CrossRef]
Tong, W.; Jiang, B.; Xu, F.; Li, Q.; Zhong, S. Privacy-Preserving Data Integrity Verification in Mobile Edge Computing. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–10 July 2019; IEEE: Piscataway, NJ, USA, July, 2019; pp. 1007–1018. [Google Scholar]
Tong, W.; Chen, W.; Jiang, B.; Xu, F.; Li, Q.; Zhong, S. Privacy-Preserving Data Integrity Verification for Secure Mobile Edge Storage. IEEE Trans. Mob. Comput. 2022, 22, 5463–5478. [Google Scholar] [CrossRef]
Liu, D.; Li, Z.; Jia, D. Secure Distributed Data Integrity Auditing with High Efficiency in 5G-Enabled Software-Defined Edge Computing. Cyber Secur. Appl. 2023, 1, 100004. [Google Scholar] [CrossRef]
Hevia, A.; Micciancio, D. The Provable Security of Graph-Based One-Time Signatures and Extensions to Algebraic Signature Schemes. In Advances in Cryptology—ASIACRYPT 2002; Zheng, Y., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2501, pp. 379–396. ISBN 978-3-540-00171-3. [Google Scholar]
Li, B.; He, Q.; Chen, F.; Dai, H.; Jin, H.; Xiang, Y.; Yang, Y. Cooperative Assurance of Cache Data Integrity for Mobile Edge Computing. IEEE Trans. Inform. Forensic Secur. 2021, 16, 4648–4662. [Google Scholar] [CrossRef]
Li, B.; He, Q.; Yuan, L.; Chen, F.; Lyu, L.; Yang, Y. EdgeWatch: Collaborative Investigation of Data Integrity at the Edge Based on Blockchain. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; ACM: New York, NY, USA, 2022; pp. 3208–3218. [Google Scholar]
Mitsis, G.; Tsiropoulou, E.E.; Papavassiliou, S. Price and Risk Awareness for Data Offloading Decision-Making in Edge Computing Systems. IEEE Syst. J. 2022, 16, 6546–6557. [Google Scholar] [CrossRef]
Aberer, K.; Despotovic, Z.; Galuba, W.; Kellerer, W. The Complex Facets of Reputation and Trust. In Computational Intelligence, Theory and Applications; Reusch, B., Ed.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 281–294. ISBN 978-3-540-34780-4. [Google Scholar]
Maskin, E.; Sjöström, T. Chapter 5 Implementation Theory. In Handbook of Social Choice and Welfare; Elsevier: Amsterdam, The Netherlands, 2002; Volume 1, pp. 237–288. ISBN 978-0-444-82914-6. [Google Scholar]
Krohn, M.N.; Freedman, M.J.; Mazieres, D. On-the-Fly Verification of Rateless Erasure Codes for Efficient Content Distribution. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 12 May 2004; IEEE: Piscataway, NJ, USA, 2004; pp. 226–240. [Google Scholar]
Nash, J.F. Equilibrium Points in n-Person Games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Edge caching data service framework.

Figure 2. System model. Cylinders of different colors represent different data. The dark cylinders represent the raw data. The light-colored cylinders represent the cached data copies in the ESs.

Figure 3. Incentive mechanism flowchart.

Figure 4. The workflow of our EDI-C.

Figure 5. The variation of computation overhead with system scale.

Figure 6. The variation of computation overhead with data block size.

Figure 7. The variation of computation overhead with sampling scale.

Figure 8. Comparison of communication overhead.

Table 1. Comparison of edge data integrity verification schemes.

Scheme	Realization Functions		Security
Scheme	Aggregation Verification	Localization and Recovery	Resistance to Forgery Attacks	Resistance to Replay Attacks	Resistance to Replace Attacks
EDI-V	×	×	√	√	√
ICE-batch	√	×	×	×	×
ICL-EDI	×	×	×	×	×
EDI-SA	√	×	√	√	√
EDI-DA	√	×	√	√	√
EDI-S	√	×	√	√	√
CooperEDI	×	√	×	×	×
EdgeWatch	×	×	×	×	×
EDI-C	√	√	√	√	√

Table 2. Parameter settings.

Parameter	Value Varied	Value Fixed
System scale	20, 40, 60, 80, 100	40
Data block size (KB)	128, 256, 512, 1024, 2048	256
Sampling scale	50, 100, 150, 200, 250	100

Table 3. Comparison of computational complexity.

Scheme	ParaGen	Challenge	ProofGen	ProofVer
EDI-V [4]	$O (l)$	$O (l + s)$	$O (l)$	$O (2^{r} + s)$
ICL-EDI [7]	$O (l)$	$O (s)$	$O (r)$	$O (r + s)$
EDI-C	$O (1)$	$O (s)$	$O (r)$	$O (r)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, F.; Sun, Y.; Gao, Q.; Chen, X. EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity. Electronics 2024, 13, 75. https://doi.org/10.3390/electronics13010075

AMA Style

Yang F, Sun Y, Gao Q, Chen X. EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity. Electronics. 2024; 13(1):75. https://doi.org/10.3390/electronics13010075

Chicago/Turabian Style

Yang, Fan, Yi Sun, Qi Gao, and Xingyuan Chen. 2024. "EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity" Electronics 13, no. 1: 75. https://doi.org/10.3390/electronics13010075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EDI-C: Reputation-Model-Based Collaborative Audit Scheme for Edge Data Integrity

Abstract

1. Introduction

2. Related Work

2.1. Traditional Cloud Storage Multiple-Replica Integrity Verification

2.2. Edge Cache Data Integrity Verification

3. Preliminaries

3.1. System Model

3.2. Fault Model

3.3. Design Goals

3.4. Homomorphic Hash

3.5. Reputation Models and Incentive Mechanism

4. The Proposed Scheme EDI-C

4.1. Verification Algorithm

4.2. Incentive Mechanism

4.3. Basic Scheme

4.3.1. Setup Phase

4.3.2. Data Integrity Audit Phase

4.3.3. Result Summary Phase

4.4. Scheme Expansion

4.4.1. Batch Audit

4.4.2. Localization and Recover

5. Security Analyses

6. Performance Evaluation

6.1. Functionality Comparison

6.2. Experimental Evaluation

6.2.1. Computation Overhead

6.2.2. Communication Overhead

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI