Research on Decentralized Storage Based on a Blockchain

Meng, Lu; Sun, Bin

doi:10.3390/su142013060

Open AccessArticle

Research on Decentralized Storage Based on a Blockchain

by

Lu Meng

^* and

Bin Sun

College of Information Science and Engineering, Northeastern University, Shenyang 110004, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(20), 13060; https://doi.org/10.3390/su142013060

Submission received: 8 September 2022 / Revised: 27 September 2022 / Accepted: 11 October 2022 / Published: 12 October 2022

(This article belongs to the Section Sustainable Management)

Download

Browse Figures

Versions Notes

Abstract

:

The current distributed storage solutions are still concentrated in third-party storage service providers, and the stored data are concentrated in a few cloud servers, which inevitably brings the risk of data loss, leakage, and tampering, so it is imperative to study a distributed storage and decentralized storage system. How to maintain the consistency of data in a distributed environment has become a problem in building decentralized applications, until the emergence of blockchain technology, whose decentralized, non-tamperable, and traceable features can solve this problem well. In this paper, we design a decentralized storage system combining Hyperledger Fabric and Inter Planetary File System (IPFS). In addition, from the perspective of security and availability of the decentralized storage system, we study the partitioning and the k-r allocation scheme of the stored data, propose the allocation function about the stored files, derive the mathematical formula of file security and availability based on the allocation function, and discuss the optimal parameter setting of the allocation function based on the formula to guarantee the high security and availability of the stored files. The experimental results show that the performance of the k-r allocation policy based on the minimum number nodes (MNN) is better than that of the k-r allocation policy based on the minimum slices number (MSN); however, with the same security and availability guarantees, the MNN policy will have more copies relative to the MSN policy, which is relatively wasteful of space.

Keywords:

blockchain; decentralized storage; IPFS; Hyperledger Fabric

1. Introduction

With the explosive growth of data, the security and efficiency of storage become especially important, and blockchain technology [1,2,3,4,5] provides a new way of thinking for data storage methods. Currently, research related to blockchain applications in storage is in the nascent stage [6], and decentralized storage research on the protection of storage resources usually stops at the level of "key encryption" protection, which leaves resources exposed to threats for a long time. For example, resources remain vulnerable if encryption keys are exposed or if malicious nodes do not remove their fragments in response to the owner’s request. Moreover, there is not much research on the availability of stored files to capture the probability of losing resources after they are uploaded to the network.

Traditional distributed storage systems have numerous drawbacks, such as centralized data and untrustworthy third-party institutions, which lead to the lack of security and privacy of stored data. The current decentralized storage scheme has not explored much about the security and availability of stored data, so this paper aims to apply the characteristics of blockchain technology, combine the peer-to-peer distributed storage network, design a decentralized storage scheme, and conduct an in-depth study on the security and availability of the scheme. The innovation points of this paper mainly include the following two points.

(1) Combine blockchain and Inter Planetary File System (IPFS) to design a decentralized storage scheme, where file hashes are stored on the blockchain and file contents are stored in the IPFS network, and design an encryption scheme for files to secure the stored data.

(2) From the perspective of file security and availability, the allocation function of file copies is proposed, and the impact of parameter settings of the allocation function on file security and availability is studied according to different allocation strategies so that the optimal parameters of the file allocation function can be determined.

2. Related Work

With the exploration and practice of scholars [7], the application of blockchain technology to the field of data storage mainly includes two approaches:

(1) Data is written directly to the block, the block header contains the hash, random value, and data hash of the previous block, and the block body is loaded with the data to be saved. After the block is verified by consensus, it can be synchronized to all nodes on the chain, which ensures the data’s immutability, but this approach requires saving the same data on all nodes, which is more redundant and causes waste of storage resources if the data is too large, and also the data synchronization speed becomes slower. This approach is only applicable to scenarios where the amount of data is small and important, such as information traceability [8,9].

(2) Data is not written directly to the block, but the file summary hash, file location, and other information are written to the block, the real data is stored in the file system, and the integrity of the file data can be verified through the calculation of the hash function. Combining the blockchain with the file storage system, the blockchain manages files through smart contracts to reach a series of operations about file uploading and downloading. This approach is universal, not bound to the size and importance of files, and can be applied to most scenarios about storage [10,11,12].

For the first approach, the literature [13] proposes limiting the number of users and setting user legitimacy to reduce the rate of data expansion. In the literature [14], the blockchain is processed by slicing and grouping according to certain criteria, and nodes place replicas according to file redundancy strategy to save space on the chain, but no details are given on how to determine the number of replicas and the redundancy strategy, and the process of releasing replicas is not clear. On such a basis, a space optimization model for federated chains has been proposed, which divides the work among nodes to reduce the full load state and thus expand the space. However, the storage space expanded in this way is limited after all, and it can only store transactions or text information that occupy relatively little space at most and cannot be applied to most storage situations.

For the second approach, the authors of the literature [15] propose a session-based data sharing scheme and a summary chain architecture implemented using immutable block chains and variable P2P (Peer to Peer) storage architecture. However, due to the variable P2P storage architecture, the possibility of tampering and manipulating the stored records is high. The literature [16] proposes a blockchain-based security architecture for distributed cloud storage in which a genetic algorithm is customized to solve the problem of file block copy placement among multiple users and multiple data centers in a distributed cloud storage environment, which improves the efficiency of file upload and download. The literature [17] proposes a blockchain-based distributed data storage scheme that applies edge computing, certificate-free encryption, and blockchain technology to large-scale data applications for the Internet of Things for the first time. In terms of secure privacy of stored data, the literature [18] proposed a blockchain-based data sharing model among cloud service providers, which takes advantage of smart contracts and access control mechanisms to effectively track data access behavior and revoke access authorization for access rule violations, solving the problem of data sharing in untrusted environments. The literature [19] designs a smart contract-based data sharing framework that uses smart contracts and blockchain technology to track, manage, and enforce data sharing protocols. In terms of integrity verification of stored data, the literature [20] proposes a blockchain-based scheme for verifying data integrity by using the Merkle tree structure in the blockchain to store metadata of data, but no protection scheme is proposed for the privacy of file data.

The main differences between the proposed method and the existing decentralized storage framework include: (1) the traditional distributed storage system cannot solve the problem of mutual trust between joint nodes, whereas blockchain technology can solve this problem well and is the most successful decentralized system at present; (2) compared with the existing blockchain-based distributed storage system, the proposed method does not directly store the file content but only saves the hash value of the corresponding file, and the specific file content can be retrieved in IPFS according to the hash value, which solves the problem that blockchain cannot save large files.

3. Method

The decentralized storage framework designed in this paper is generally divided into four parts, which are user layer, data processing layer, storage network layer, and blockchain layer. The user layer is mainly for registering users, and the user status is divided into ordinary users and administrators. The system has an access system, and only the administrator agrees to apply for registration as an ordinary user. The data processing layer is mainly responsible for data uploading, downloading, encryption, and slicing. The storage network layer uses the IPFS network protocol, which is responsible for connecting the various nodes that join the network through P2P technology to store the processed data fragments. The blockchain layer uses a federated chain, which mainly interacts with the outside world through smart contracts and is responsible for uploading metadata such as file hash, file name, owner, etc. Users can find the metadata of the stored files on the chain at any time and then retrieve their files.

The operation flow of the whole framework is shown in Figure 1. When users store data, they need to first encrypt the data files using RSA (Rivest–Shamir–Adleman) combined with an AES (Advanced Encryption Standard) double encryption algorithm, and then the encrypted files will be fragmented according to certain rules, and the file fragments will be uploaded to the IPFS network. The IPFS network will generate the hash of the uploaded file and finally store the file hash, file name, and owner in the form of a JSON file on the blockchain. When users download data, they need to get the storage address of the file first. The address of storage file on the IPFS network is determined based on the contents of the file, so only the file hash needs to be found (file hash is the value obtained by hashing the file content). Users need to check the hash value of the data from the blockchain first, get the encrypted data on the IPFS network through the hash value, and finally decrypt the encrypted data to get the original data.

3.1. Data Processing Layer

The data processing layer mainly encrypts and slices the data to be uploaded and downloaded. And for data encryption, we design a double encryption strategy combining RSA and AES encryption algorithms, shown in Figure 2.

Data fragmentation is to split the encrypted data into fragments of size N (256KB by default), and the remaining part less than N is filled with 0. Then these fragments are scattered and stored on each node, and finally the hash value of these fragments after taking the hash constitutes a Merkle tree, as shown in Figure 3. Taking the example of dividing the encrypted file into four pieces, the encrypted file is divided into four fragments, and the hash value is calculated for each of the four fragments to generate the C, D, E, and F nodes; then, the C and D nodes are combined together to calculate the hash value to generate the A node, and, similarly, the B node is generated based on E and F. Then, the root hash value of this tree is generated based on the values of the A and B nodes obtained in the previous step. Merkle root hash values need to be saved to the blockchain layer for data integrity verification.

3.2. Storage Network Layer

The storage network layer mainly addresses the distribution problem of resolving data fragments in distributed storage. This solution adopts the underlying network protocol based on IPFS and constructs a distributed loose hash table, i.e., DHT (Distributed Hash Table), using the Kademlia protocol. When each node joins the network, it will obtain a 160-bit identity ID based on the hash algorithm (SHA1) and store the file fragments in the network based on this identity ID. Moreover, for each file fragment, a 160-bit hash value is also generated by the hash algorithm based on the file fragment’s contents. Then, we take the hash value of the file fragment as “key” and take the hash value of the node as “value”; therefore, we can build a huge index table, which is composed of < key, value > pairs. Each node needs to maintain a routing table to keep the information of nodes near to its own node ID, and this routing table is called the K bucket.

3.3. Blockchain Layer

The blockchain layer is designed based on the open-source platform HyperLedger Fabric blockchain, and the Fabric network structure is shown in Figure 4.

The fabric network has an authentication mechanism, which means all nodes have an identity certificate, and to some extent nodes trust each other; therefore, we use Etcdraft as the consensus mechanism of the blockchain layer, which is based on the RAFT [21] algorithm to ensure the strong consistency of data. The RAFT algorithm has unique properties, such as strong leader, leader election, and member relationship adjustment. In the RAFT algorithm, orderer nodes are directly connected to each other without relying on other clusters, and each organization’s orderer nodes have the opportunity to participate in the consensus sorting service, which is close to a decentralized algorithm. Compared with other consensus algorithms (e.g., Proof of Work (PoW), Proof of Stake (PoS), Proof of Solution (PoSo) [22], and Proof of Search (PoSe) [23]), Etcdraft has advantages. Using PoW, the blockchain is less efficient in getting out blocks, and the mining process consumes a lot of computation resources. The advantage of PoS is that it solves the problems of wasted resources and inefficiency in PoW. However, the biggest problem of PoS is that it still tends to create a situation in which whoever has more tokens gets more tokens, and, theoretically, whoever can hold 51% of the tokens can control the whole network, so it is weaker in decentralization. PoSo and PoSe mimics PoW by replacing the meaningless mathematical puzzle in PoW with a meaningful optimization problem; however, PoSo and Pose still had the problem of low block generation speed.

The proposed method uses Hyperledger Fabric as the blockchain layer, which is a federated chain that sacrifices its decentralized features to obtain higher corresponding speed and lower latency; moreover, we use uses Etcdraft as the consensus algorithm to further reduce the latency of the distributed storage system.

4. File Allocation Functions and Policies

In this paper, files are divided into multiple slices of the same size before being uploaded to the IPFS network, and the slices of the files are kept scattered on various nodes in the IPFS network. However, too many copies of slices cause a large waste of storage resources, and this affects the efficiency of file upload and download. Therefore, it is worth considering how to develop a file allocation policy so that the scheme can meet the requirements of security and availability while improving the efficiency of file upload and download as much as possible. Before formulating the file allocation policy, a functional model on file allocation is defined in this paper in order to facilitate a quantitative study on security and availability.

4.1. File Allocation Functions

Suppose the file F is sliced and a collection S of slices is obtained, S = {s₁, s₂, …, s_s}, and a group of nodes N, N = {n₁, n₂, …, n_n}. In addition,

\forall s_{i} \in S

; we define the allocation function

φ

, shown as Formula (1),

φ (s_{i}) = {n_{i}}, n_{i} \subseteq N, n_{i} \neq \emptyset

(1)

The allocation function specifies how slices are assigned to nodes in the network, and by default each slice is assigned to at least one node. Figure 5 shows an example of an allocation function that divides a file into ten slices S = {s₁, s₂, …, s₁₀} with five nodes N = {n₁, n₂, …, n₅}. There is one row for each node and one column for each slice, with a gray rectangle to indicate that the slice is assigned to that node and a dashed box to indicate that the slice is not assigned to that node. For example,

φ (s_{1}) = {n_{1}, n_{2}}

indicates that the slice s₁ has copies saved at the node n₁ and n₂.

There are two main attributes in the file allocation process: the first one is the number of copies of the file slice, which provides file availability, and the other one is the diversity of the allocated nodes, which can provide security against malicious node sets. The number of copies may be different for each slice, but for the purposes of this study, it is assumed here that all slices have the same number of copies. Therefore, a replication function can be defined to represent the number of copies of a slice by defining the r-replication function, shown as Formula (2).

\forall s_{i} \in S, | φ (s_{i}) | \geq r

(2)

In Figure 5, each slice is saved on two nodes, which is an example of a 2-replication function.

Similarly, a k-protection function is defined based on the number of nodes that can reorganize the file, shown as Formula (3).

{n_{i}} \subset N, | n_{i} | \leq k, \exists s_{j} \in S s . t . φ (s_{j}) \cap {n_{i}} = \emptyset

(3)

The k-protection function guards against a set of k malicious nodes, since it takes at least k + 1 nodes to collect all the slices that reorganize a file. In other words, the process of file allocation that follows the principle of the k-protection function implies that for a set of nodes N, for any set of nodes whose number does not exceed k, there exists at least one file slice that is not stored on the nodes of the set, so that such a set of nodes can neither obtain the original file nor prevent the deletion of the file.

Shown as Figure 5, it is a 2-protection function because the combination of any 2 nodes misses at least one slice. For example, the combination of nodes n₁ and n₂ misses slice s₆, s₇, s₈, s₉, s₁₀; the combination of nodes n₁ and n₄ misses slice s₅ and s₇. However, it is not a 3-protection function, because there exists a combination of 3 nodes (n₁, n₃ and n₅) having all slices that can reorganize the file.

In this paper, we propose a k-r allocation function, which satisfies both the k-protection function and the r-replication function, and the k-r allocation function must meet two terms: (1) it is not a k + 1 protection function; (2)

\forall s_{i} \in S, | φ (s_{i}) | = r

.

Therefore, based on this definition, it is a 2-2 allocation function in Figure 5 because it both meets the 2-protection function and 2- replication function.

4.2. File Allocation Policy

We study the file allocation policy based on the k-r allocation function. Let s = the number of file slices and n = the number of nodes, and the k-r allocation function can be discussed from two extremes: the Minimal Slice Number (MSN) policy and the Minimal Node Number (MNN) policy.

(1): MSN policy

MSN implies that the number of file slices is as small as possible, yet the k-r allocation function must be satisfied, so the number of slices s must satisfy Formula (4).

s \geq k + 1

(4)

Since at least k + 1 slices are required to ensure that a set of k nodes cannot reorganize resources, the minimum number of slices is s = k + 1, and using this allocation implies the following two inferences:

(1) One node stores one file slice at most.

Because there are only k + 1 file slices, if there is a node that keeps more than one file slice, it implies that there is a set of k nodes that can reorganize the file and do not meet the k-protection function;

(2) The number of nodes n is exactly r times the number of file slices; that is,

n = r (k + 1)

.

The second inference can be derived from the first inference: one node stores one file slice at most, and because the r-replication function needs to be satisfied and each file slice has r copies, then we can calculate the total number of nodes n; that is,

n = r (k + 1)

.

For example, we apply MSN policy in terms of the 3-2 allocation function, which indicates that one file is divided into 4 slices and each slice has 2 copies, and all these copies are allocated in 8 nodes, shown as Figure 6.

Using MSN policy in terms of the r-k allocation function, we can resist a set of k malicious nodes and reorganize the file with k + 1 nodes, but the number of nodes needed for this policy increases rapidly with k and r increasing.

(2): MNN policy

To guarantee k-protection function, it must be required that for any set of k nodes, there must exist at least one slice s_extra that is not stored on any node in the set. Moreover, to guarantee the r-replication function, such a slice s_extra should be stored on the r nodes that do not belong to the set. Therefore, we can conclude that the number of nodes n must meet Formula (5).

n \geq k + r

(5)

When the number of nodes takes the minimum value,

n = k + r

, which means that no set of any k nodes can get enough slices to reorganize the file; they miss one file slice, which is on the other r nodes. Moreover, the missing slices of each set are different, so each set of k nodes corresponds to one missing slice. We can calculate the number of sets with k nodes as

C_{n}^{k}

. Therefore, it can be known that the MNN policy in terms of the k-r distribution function must meet Formulas (6) and (7).

n = k + r

(6)

s = C_{n}^{k} = C_{k + r}^{k}

(7)

where n represents the number of nodes and s represents the number of file slices.

MNN policy indicates that the k-r allocation function requires dividing the file into finer-grained slices and assigning the slices to nodes in a diverse way, thus ensuring that no set of k nodes can reconfigure the resource. MNN policy has two characteristics: the set of any k nodes misses exactly one slice; and the missing slices are unique for all the sets. This is because any k + 1 nodes can reconstruct the file. For example, using a 3-2 allocation with MNN policy means that there are 5 nodes, and each file is divided into

C_{5}^{3}

(=10) slices, and each slice has 2 copies, as shown in Figure 7.

4.3. Parameter Setting of k-r Allocation Function

In the k-r allocation function, the values of the k and r parameters depend on the characteristics of the decentralized storage network. For example, in a stable network, a small number of copies is sufficient to provide high availability (r can be a small value), but in a highly variable network, more copies are needed to provide high availability (r has to be a relatively large value). Similarly, the k value also depends on the security setting of the decentralized storage network.

Suppose that the probability of a single node failing or not being online is

p_{u}

, and the probability of a single node producing malicious behavior is

p_{c}

. We assume that all nodes have the same probability of failing and producing malicious behavior. When using k-r allocation, the probability of a file in the storage network becoming unavailable is set to

P_{u}

and the probability of leakage to a malicious node set is set to

P_{c}

, so the availability of the scheme can then be represented by

1 - P_{u}

, and the security can be represented by

1 - P_{c}

.

Next, we can discuss the influence of parameter k, r to

P_{u}

and

P_{c}

in terms of MSN policy and MNN policy, respectively.

(1): Setting of k, r parameters based on the MSN policy

Using the k-r allocation function based on the MSN policy, any of the k + 1 slices that make up the file will become unavailable when all r nodes that store that slice fail, so

P_{u}

can be derived by Formula (8).

P_{u} = 1 - {(1 - {(p_{u})}^{r})}^{k + 1}

(8)

where

1 - {(p_{u})}^{r}

represents the probability that one of the r nodes of a slice is available for storage. We assume that node failures are independent, and

{(1 - {(p_{u})}^{r})}^{k + 1}

is the probability that each of all of the k + 1 slices has at least one copy available.

Similarly, using the k-r allocation function based on the MNN policy, the file is divided into k + 1 slices, so when each node in the k + 1 malicious nodes set stores a different slice, the file is exposed and

P_{c}

can be derived by Formula (9).

P_{c} = {(1 - {(1 - p_{c})}^{r})}^{k + 1}

(9)

where

{(1 - p_{c})}^{r}

is the probability that a slice is stored on a node that does not belong to the set, and

1 - {(1 - p_{c})}^{r}

is the probability of a slice being exposed; therefore, the probability of a malicious node set having k + 1 different slices is

{(1 - {(1 - p_{c})}^{r})}^{k + 1}

.

According to Formula (8), when

p_{u}

takes 0.2, 0.4, 0.6, and 0.8, respectively, fixing r = 5 and varying k from 1 to 25, we can get Figure 8a, and, fixing k = 5 and varying r from 1 to 25, we can get Figure 8b. According to Formula (9), when

p_{c}

takes 0.2, 0.4, 0.6, and 0.8, respectively, fixing r = 5 and varying k from 1 to 25, we can obtain Figure 8c, and, fixing k = 5 and varying r from 1 to 25, we can obtain Figure 8d.

From Figure 8a, it can be seen that as the value of k increases,

P_{u}

increases, which means that the availability of the file decreases because the increase in the number of nodes leads to the dispersion of resources and the possibility of rebuilding the file decreases. From Figure 8b, it can be seen that if the failure probability

p_{u}

of a single node is low,

P_{u}

stays low no matter how the r value increases, which means that the availability of the file is always high, and

P_{u}

decreases as the r value increases because each slice will be stored on more nodes, reducing the risk of unavailability.

Figure 8c shows that

P_{c}

decreases as k increases, because the number of slices increases, which indicates that the number of nodes forming a valid malicious set also needs to increase; therefore, the probability of forming a valid malicious set decreases.

Figure 8d shows that

P_{c}

increases with r, because the number of copies per slice increases, which indicates that the probability of a copy being stored on a malicious node also increases.

(2): Setting of k, r parameters based on the MNN policy

Using the k-r allocation function based on the MNN policy, the file is not available when the set of any r nodes (or greater-than-r nodes) fails. The formula for

P_{u}

can be derived as Formula (10).

P_{u} = \sum_{i = r}^{k + r} C_{k + r}^{i} {(p_{u})}^{i} {(1 - p_{u})}^{k + r - i}

(10)

where

C_{k + r}^{i}

denotes the possible combinations of i nodes, and i varies in the range [r,k + r],

{(P_{u})}^{i}

is the probability that i nodes fail, and

{(1 - P_{u})}^{k + r - i}

is the probability that the remaining nodes do not fail.

According to the principle of MNN policy, any k + 1 nodes taken from all k + r nodes will have all the necessary slices to reorganize the file; therefore, any set of k + 1 nodes will expose files. The formula for

P_{c}

can be derived as Formula (11).

P_{c} = \sum_{i = k + 1}^{k + r} C_{k + r}^{i} {(p_{c})}^{i} {(1 - p_{c})}^{k + r - i}

(11)

where

C_{k + r}^{i}

denotes the combination of i nodes and i varies in the range [k + 1, k + r],

{(p_{c})}^{i}

is the probability that i nodes form a malicious node set, and

{(1 - p_{c})}^{k + r - i}

is the probability that the remaining nodes are not malicious nodes.

According to Formula (10), when

p_{u}

is taken as 0.2, 0.4, 0.6, and 0.8, respectively, fixing r = 5 and varying k from 1 to 25, we can get Figure 9a, and, fixing k = 5 and varying r from 1 to 25, we can get Figure 9b. According to Formula (11), when

p_{c}

is taken as 0.2, 0.4, 0.6, and 0.8, respectively, fixing r = 5 and varying k from 1 to 25, we can get Figure 9c, and, fixing k = 5, r varies from 1 to 25, so we can get Figure 9d.

As shown in Figure 9,

P_{u}

increases with k and decreases with r, whereas

P_{u}

decreases with k and increases with r. Therefore, it can be concluded that the file availability (

1 - P_{u}

) decreases with increasing k and increases with increasing r. The file security (

1 - P_{c}

) increases with increasing k and decreases with increasing r.

5. Experimental Results

5.1. Experimental Hardware and Software

In the experiments, the blockchain decentralized storage system was built as shown in Figure 1, using Hypderledger Fabric 1.4.0, implemented on 25 virtual machine (VM) nodes, including 4 peer nodes and 5 orderer nodes, running in docker 19.03, and the operating systems were all ubuntu 18.04. The storage system is built using IPFS 0.7.0, and the associated front-end interface is written in Node.js. The hardware configuration of VM is CPU@3.0GHz×8, 8G RAM, 50G disk. All files are stored in IPFS, the corresponding file hashes are stored in Hyperledger Fabric, and users access the files through the Fabric’s smart contracts, which are written using Golang 1.15.

5.2. Optimal Parameters for the Decentralized Storage System

In the experiments, the probability that a single file is unavailable is set to

P_{u}

, and the probability that a single file is leaked to a malicious node set is set to

P_{c}

. To ensure the security and availability of the decentralized storage system, it is required that

P_{u}

and

P_{c}

meet Equation (12).

P_{u} \leq 10^{- 7}, P_{c} \leq 10^{- 6}

(12)

Based on such security and availability settings, we can find the optimal parameters of k and r in the k-r allocation function based on MSN policy and MNN policy, respectively.

Considering the practical application of decentralized storage systems, both

P_{u}

and

P_{c}

should be very small values, so we set them to be less than 10⁻⁷ and 10⁻⁶ in our experiments. However, it should be noted that these two values are not fixed, and they can be adjusted as the storage system design requirements change.

5.2.1. Optimal Parameters of k-r Allocation Function for the MSN Policy

Based on the different values of

p_{u}

and

p_{c}

, we classify decentralized storage systems using MSN policy into three types: (1) when

p_{u}

= 0.001 and

p_{c}

= 0.5, it means that this is a storage system with high reliability (low probability of failure of nodes) and low trust (high probability of malicious nodes); (2) when

p_{u}

= 0.005 and

p_{c}

= 0.2, it means that this is a storage system with medium reliability (medium probability of failure of nodes) and medium trust (medium probability of maliciousness of nodes); (3) when

p_{u}

= 0.05 and

p_{c}

= 0.1, it means that this is a storage system with low reliability (high probability of failure of nodes) and high trust (low probability of maliciousness of nodes). According to Formulas (10) and (11), by changing the values of k and r, we can obtain the performance of different types of decentralized storage system using MSN policy, shown as Figure 10; the intersection of the red area and the blue area is the best setting for the k and r parameters. In Figure 10a, the probability of failure of a single node,

p_{u}

, is 0.001 and the probability of a malicious node,

p_{c}

, is 0.5, which indicates a decentralized storage system with high reliability and low trust. So, when MSN policy is used, the optimal configuration parameter is k = 27, r = 4, and the number of nodes n = 31. Figure 10b shows a decentralized storage system with medium reliability and medium trust with the optimal parameters k = 12, r = 5, and the number of nodes n = 17. Figure 10c shows a decentralized storage system with low reliability and high trust with the optimal parameters k = 10, r = 9, and the number of nodes n = 19.

5.2.2. Optimal Parameters of k-r Allocation Function for the MNN Policy

Similarly, we also classify decentralized storage systems using MNN policy into three types: (1) when

p_{u}

= 0.001 and

p_{c}

= 0.5, it means that this is a storage system with high reliability and low trust; (2) when

p_{u}

= 0.005 and

p_{c}

= 0.2, it means that this is a storage system with medium reliability and medium trust; (3) when

p_{u}

= 0.05 and

p_{c}

= 0.1, it means that this is a storage system with low reliability. According to Formulas (8) and (9), by changing the values of k and r, we can obtain the performance of different types of decentralized storage system using MNN policy, shown as Figure 11. The intersection of the red area and the blue area is the best setting for the k and r parameters. In Figure 11a, the probability of failure of a single node,

p_{u}

, is 0.001 and the probability of a malicious node,

p_{c}

, is 0.5, which is a decentralized storage system with high reliability and low trust. The optimal parameters based on the MNN policy are k = 100, r = 3, and the number of slices s = 101. Figure 11b shows a decentralized storage system with medium reliability and medium trust with the optimal parameters k = 26, r = 4, and the number of slices s = 27. Figure 11c shows a decentralized storage system with low reliability and trust with the optimal parameters k = 18, r = 7, and the number of slices s = 19.

5.3. Comparison of Decentralized Storage System Performance using MSN Policy and MNN Policy

We compared and analyzed the performance of both MSN and MNN policies, including file upload time, file download time, and average network transfer rate (Mbps). We used files whose sizes are from 1 MB to 256 MB for the upload and download test and executed it ten times to take the average value. The experimental results are shown in Figure 12.

From the experimental results, it can be found that the performance of MNN policy is better than that of MSN policy. Especially, as the file gets bigger and bigger, the time saved for uploading and downloading is also more and more. This is mainly because of two reasons. First, in a P2P network, when uploading and downloading, it involves more nodes; there is always a difference between nodes with good performance (fast network speed) and nodes with poor performance (slow network speed). The MNN policy only requires at least k + 1 nodes with good performance to provide slices of the file to get the file quickly, whereas the MSN policy, on the other hand, assigns each slice to a group of nodes, and as long as one group of nodes is slow, it will delay the reorganization of the file. Second, information interaction between nodes is time-consuming, and the more nodes involved in the system, the more it affects the file upload and download speed, and the MNN policy obviously involves fewer nodes.

However, with the same security and availability requirements, the parameter r in the MNN policy is significantly larger than that in the MSN policy, which means that there will be more copies for the MNN policy, which is a relative waste of space. Therefore, based on the previous experimental results, we find that MNN is a better choice for a distributed storage system that is trusted and requires high performance; however, MSN is a better choice for a distributed storage system that emphasizes security or wants to save storage space.

6. Conclusion

In this paper, we propose three findings: (1) a decentralized storage system by combining Hyperledger Fabric and IPFS, (2) two storage schemes: MSN policy and MNN policy based on k-r allocation function, and (3) the performance of MNN policy is better than that of MSN policy; however, with the same security and availability guarantee, the MNN policy will have more copies compared to the MSN policy, which is relatively wasteful of space.

Therefore, it is necessary to build blockchain-based decentralized storage systems with appropriate policies according to different situations and requirements in practical engineering applications.

It should be noted that the proposed method uses Hyperledger Fabric as the blockchain layer, which improves network performance while reducing decentralization and requires further improvement in the future.

Author Contributions

Project administration, L.M.; funding acquisition, L.M.; writing-review and editing, L.M.; software, B.S.; writing-original draft preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (62073061), Fundamental Research Funds for the Central Universities (N2204009), and non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (2022-JKCS-21).

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Qu, Y.; Pokhrel, S.R.; Garg, S.; Gao, L.; Xiang, Y. A Blockchained Federated Learning Framework for Cognitive Computing in Industry 4.0 Networks. IEEE Trans. Ind. Inform. 2020, 17, 2964–2973. [Google Scholar] [CrossRef]
Xu, C.; Qu, Y.; Luan, T.H.; Eklund, P.W.; Xiang, Y.; Gao, L. A Lightweight and Attack-Proof Bidirectional Blockchain Paradigm for Internet of Things. IEEE Internet Things J. 2022, 9, 4371–4384. [Google Scholar] [CrossRef]
dos Santos Abreu, A.W.; Coutinho, E.F.; Bezerra, C.I.M. Performance Evaluation of Data Transactions in Blockchain. IEEE Lat. Am. Trans. 2021, 20, 409–416. [Google Scholar] [CrossRef]
Zheng, W.; Zheng, Z.; Chen, X.; Dai, K.; Li, P.; Chen, R. NutBaaS: A Blockchain-as-a-Service Platform. IEEE Access 2019, 7, 134422–134433. [Google Scholar] [CrossRef]
Kanade, V.A. A Blockchain-Based Distributed Storage Network to Manage Growing Data Storage Needs. In Proceedings of the 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 365–368. [Google Scholar]
Song, F.; Zhu, M.; Zhou, Y.; You, I.; Zhang, H. Smart Collaborative Tracking for Ubiquitous Power IoT in Edge-Cloud Interplay Domain. IEEE Internet Things J. 2019, 7, 6046–6055. [Google Scholar] [CrossRef]
Li, L.; Liu, Y.; You, I.; Song, F. A Smart Retransmission Mechanism for Ultra-Reliable Applications in Industrial Wireless Networks. IEEE Trans. Ind. Inform. 2022, 1–9. [Google Scholar] [CrossRef]
Ullah, Z.; Raza, B.; Shah, H.; Khan, S.; Waheed, A. Towards Blockchain-Based Secure Storage and Trusted Data Sharing Scheme for IoT Environment. IEEE Access 2022, 10, 36978–36994. [Google Scholar] [CrossRef]
Yin, H.; Zhang, Z.; He, J.; Ma, L.; Zhu, L.; Li, M.; Khoussainov, B. Proof of Continuous Work for Reliable Data Storage Over Permissionless Blockchain. IEEE Internet Things J. 2022, 9, 7866–7875. [Google Scholar] [CrossRef]
Mughal, M.H.; Shaikh, Z.A.; Ali, K.; Ali, S.; Hassan, S. IPFS and Blockchain Based Reliability and Availability Improvement for Integrated Rivers’ Streamflow Data. IEEE Access 2022, 10, 61101–61123. [Google Scholar] [CrossRef]
Hasan, H.R.; Salah, K.; Yaqoob, I.; Jayaraman, R.; Pesic, S.; Omar, M. Trustworthy IoT Data Streaming Using Blockchain and IPFS. IEEE Access 2022, 10, 17707–17721. [Google Scholar] [CrossRef]
Wiraatmaja, C.; Zhang, Y.; Sasabe, M.; Kasahara, S. Cost-Efficient Blockchain-Based Access Control for the Internet of Things. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Boyd, C.; Carr, C. Fair client puzzles from the Bitcoin blockchain. In Proceedings of the 21st Australasian Conference on Information Security and Privacy, Melbourne, VIC, Australia, 4–6 July 2016; pp. 161–177. [Google Scholar]
Dayu, J.; Junchang, X.; Zhiqiong, W.; Wei, G.U.; Guoren, W.A. Storage Capacity Scalable Model for Blockchain. J. Front. Comput. Sci. Technol. 2018, 12, 525–535. [Google Scholar]
Shen, B.; Guo, J.; Yang, Y. MedChain: Efficient Healthcare Data Sharing via Blockchain. Appl. Sci. 2019, 9, 1207. [Google Scholar] [CrossRef] [Green Version]
Li, J.; Wu, J.; Chen, L. Block-secure: Blockchain based scheme for secure P2P cloud storage. Inf. Sci. 2018, 465, 219–231. [Google Scholar] [CrossRef]
Li, R.; Song, T.; Mei, B.; Li, H.; Cheng, X.; Sun, L. Blockchain for large-scale internet of things data storage and protection. IEEE Trans. Serv. Comput. 2018, 12, 762–771. [Google Scholar] [CrossRef]
Xia, Q.; Sifah, E.B.; Asamoah, K.O.; Gao, J.; Du, X.; Guizani, M. MeDShare: Trust-less medical data sharing among cloud service providers via blockchain. IEEE Access 2017, 5, 14757–14767. [Google Scholar] [CrossRef]
Liu, K.; Desai, H.; Kagal, L. Enforceable data sharing agreements using smart contracts. arXiv 2018, arXiv:1804.10645. [Google Scholar]
Kiran, P. Study of M-commerce and its Usability Factor with respect to Transaction and Entertainment in the Four Age Groups. Manuf. Autom. 2014, 13, 581–589. [Google Scholar]
Ongaro, D.; Ousterhout, J.K. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference, Philadepia, PA, USA, 19–20 June 2014; pp. 305–319. [Google Scholar]
Sijie Chen, H.M.; Ping, J.; Yan, Z.; Shen, Z.; Liu, X.; Zhang, N.; Xia, Q.; Kang, C. A blockchain consensus mechanism that uses Proof of Solution to optimize energy dispatch and trading. Nat. Energy 2022, 7, 495–502. [Google Scholar] [CrossRef]
Shibata, N. Proof-of-Search: Combining Blockchain Consensus Formation With Solving Optimization Problems. IEEE Access 2019, 7, 172994–173006. [Google Scholar] [CrossRef]

Figure 1. Running process of decentralized storage solution.

Figure 2. Encryption process.

Figure 3. Data fragmentation, taking the example of dividing the encrypted file into four pieces.

Figure 4. Fabric network design.

Figure 5. An example of minimum 2-2 distribution function.

Figure 6. An example of 3-2 allocation of MSN.

Figure 7. An example of 3-2 allocation of minimum node numbers. Any set consisting of 3 nodes loses one file slice, and for any such set, the lost file slice is different from each other.

Figure 8. File exposure and unavailability probability based on minimum number of slices ((a) Fixed

r = 5

,

k

change, file unavailable probability; (b) Fixed

k = 5

,

r

change, file unavailable probability; (c) Fixed

r = 5

,

k

change, file exposure probability; (d) Fixed

k = 5

,

r

change, file exposure probability).

Figure 8. File exposure and unavailability probability based on minimum number of slices ((a) Fixed

r = 5

,

k

change, file unavailable probability; (b) Fixed

k = 5

,

r

change, file unavailable probability; (c) Fixed

r = 5

,

k

change, file exposure probability; (d) Fixed

k = 5

,

r

change, file exposure probability).

Figure 9. File exposure and unavailability probability based on minimum number of nodes. (a) Fixed

r = 5

,

k

change, file unavailable probability; (b) Fixed

k = 5

,

r

change, file unavailable probability; (c) Fixed

r = 5

,

k

change, file exposure probability; (d) Fixed

k = 5

,

r

change, file exposure probability.

Figure 9. File exposure and unavailability probability based on minimum number of nodes. (a) Fixed

r = 5

,

k

change, file unavailable probability; (b) Fixed

k = 5

,

r

change, file unavailable probability; (c) Fixed

r = 5

,

k

change, file exposure probability; (d) Fixed

k = 5

,

r

change, file exposure probability.

Figure 10. The performance of different types of decentralized storage system using MSN policy by changing the values of k and r.

(a) p_{u} = 0.001, p_{c} = 0.5

;

(b) p_{u} = 0.005, p_{c} = 0.2

;

(c) p_{u} = 0.05, p_{c} = 0.1

. The blue area represents k, r parameters that satisfy availability (

P_{u} \leq 10^{- 7}

), and the red area represents k, r parameters that satisfy security (

P_{c} \leq 10^{- 6}

). Then, the crossed area represents k, r parameters that satisfy both availability and security. Therefore, the intersection of the red area and the blue area is the best setting for the k and r parameters.

Figure 10. The performance of different types of decentralized storage system using MSN policy by changing the values of k and r.

(a) p_{u} = 0.001, p_{c} = 0.5

;

(b) p_{u} = 0.005, p_{c} = 0.2

;

(c) p_{u} = 0.05, p_{c} = 0.1

. The blue area represents k, r parameters that satisfy availability (

P_{u} \leq 10^{- 7}

), and the red area represents k, r parameters that satisfy security (

P_{c} \leq 10^{- 6}

). Then, the crossed area represents k, r parameters that satisfy both availability and security. Therefore, the intersection of the red area and the blue area is the best setting for the k and r parameters.

Figure 11. The performance of different types of decentralized storage system using MNN policy by changing the values of k and r.

(a) p_{u} = 0.001, p_{c} = 0.5

;

(b) p_{u} = 0.005, p_{c} = 0.2

;

(c) p_{u} = 0.05, p_{c} = 0.1

. The blue area represents k, r parameters that satisfy availability (

P_{u} \leq 10^{- 7}

), and the red area represents k, r parameters that satisfy security (

P_{c} \leq 10^{- 6}

). The crossed area represents k, r parameters that satisfy both availability and security. Therefore, the intersection of the red area and the blue area is the best setting for the k and r parameters.

Figure 11. The performance of different types of decentralized storage system using MNN policy by changing the values of k and r.

(a) p_{u} = 0.001, p_{c} = 0.5

;

(b) p_{u} = 0.005, p_{c} = 0.2

;

(c) p_{u} = 0.05, p_{c} = 0.1

. The blue area represents k, r parameters that satisfy availability (

P_{u} \leq 10^{- 7}

), and the red area represents k, r parameters that satisfy security (

P_{c} \leq 10^{- 6}

). The crossed area represents k, r parameters that satisfy both availability and security. Therefore, the intersection of the red area and the blue area is the best setting for the k and r parameters.

Figure 12. Performance comparison of MSN and MNN policies. (a) File upload time; (b) File download time; (c) Average network throughput.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meng, L.; Sun, B. Research on Decentralized Storage Based on a Blockchain. Sustainability 2022, 14, 13060. https://doi.org/10.3390/su142013060

AMA Style

Meng L, Sun B. Research on Decentralized Storage Based on a Blockchain. Sustainability. 2022; 14(20):13060. https://doi.org/10.3390/su142013060

Chicago/Turabian Style

Meng, Lu, and Bin Sun. 2022. "Research on Decentralized Storage Based on a Blockchain" Sustainability 14, no. 20: 13060. https://doi.org/10.3390/su142013060

APA Style

Meng, L., & Sun, B. (2022). Research on Decentralized Storage Based on a Blockchain. Sustainability, 14(20), 13060. https://doi.org/10.3390/su142013060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Decentralized Storage Based on a Blockchain

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Data Processing Layer

3.2. Storage Network Layer

3.3. Blockchain Layer

4. File Allocation Functions and Policies

4.1. File Allocation Functions

4.2. File Allocation Policy

4.3. Parameter Setting of k-r Allocation Function

5. Experimental Results

5.1. Experimental Hardware and Software

5.2. Optimal Parameters for the Decentralized Storage System

5.2.1. Optimal Parameters of k-r Allocation Function for the MSN Policy

5.2.2. Optimal Parameters of k-r Allocation Function for the MNN Policy

5.3. Comparison of Decentralized Storage System Performance using MSN Policy and MNN Policy

6. Conclusion

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI