1. Introduction
According to Article 1, Number 7 [
1] of the Indonesian Constitution concerning Basic Archival Requirements, archives are texts which are made and accepted by state institutions and government agencies in any form, whether in a single or group condition, in the context of carrying out government activities. Archives are essential because of the substantial information they offer in the administrative process. Therefore, they should be managed properly; for instance, there should be an orderly, neat storage system and a way to retrieve, recover, and secure the archives. An excellent archive management system leverages the organization in question, such as the government, to work efficiently.
Nowadays, many people use information technology to help manage archives. Cilegon E-Archive (CEA) is a local government archive management system in Cilegon City, West Java, Indonesia. It has features for managing the archives, including their collection, retention, and retrieval.
Figure 1 illustrates the mapping of the CEA system. It is located in Serpong, in the south of Tangerang City. However, the archive users are in Cilegon City, which is 99.4 km from Serpong. In Indonesia, archives are important instruments in the auditing process. The Badan Pemeriksa Keuangan (BPK), as Indonesian the government auditor, audits the archives at least once a year. Even though the system server is located near the BPK, the BPK must go to Cilegon because the BPK does not have access to the system. Moreover, the BPK prefers to inspect printed archives that are stored in a physical storage unit in Cilegon City, because there are many archival forgeries in Indonesia.
The CEA system is a centralized system that is located on a single cloud server belonging to Badan Pengkajian dan Penerapan Teknologi (BPPT), which is the Agency for the Assessment and Application of Technology in Indonesia. A centralized system uses a client–server architecture in which nodes connect to one server and are controlled by the network administrator. It makes it easy to track and manage the data. However, a centralized system has several vulnerabilities, such as a single point of failure, low data availability, and lack of data confidentiality.
A single point of failure is a condition in which the entire system stops working when the server goes down. The clients cannot connect to the server, so the system cannot process requests to provide data. Moreover, if there is no backup, users can lose data immediately. Therefore, it can cause low data availability.
A good archive management system should consider data confidentiality in order to ensure data can be accessed only by registered users. The existing CEA system has a feature to manage user control. The archivists are allowed to alter the archives in their respective units. Even so, this system cannot guarantee whether the modifications are valid. This condition triggers archival forgeries; because the data and files are only stored in the server without any encryption, they are not secure and unauthorized users can replace legitimate files with fake files or modify the data.
Considering these issues, this paper integrates the Interplanetary File System (IPFS) and blockchain. IPFS is a distributed file storage system that uses content-based addressing and guarantees high-throughput by combining a distributed hash table, BitSwap exchange protocol, and naming system [
2]. Blockchain has a consensus mechanism to validate the transactions and guarantee the integrity of each node [
3]. In turn, the transactions are unchangeable. Nevertheless, blockchain has limitations in data storage, and there are costs of executing transactions. The cost depends on how big the stored data are.
This paper proposes a new CEA system, using a decentralized architecture. It integrates IPFS and blockchain technology. IPFS shares the files with other nodes in the network instead of only storing the files in one server. Pinning service is a feature of IPFS that ensures high availability and long-term retention. IPFS generates a hash to identify the content of the file. A blockchain-based storing system is expensive to store massive data. Therefore, the hash of a file will be stored in the blockchain, using a smart contract.
It is difficult to modify transactions in a blockchain because blockchain has a consensus mechanism to validate them. The transactions are put in blocks, and each block is associated with each other, secured with a digital signature, and proved with a timestamp. Both IPFS and blockchain will be configured as a private network, to restrict file access from public IPFS and Ethereum networks. This configuration can improve data confidentiality and integrity, to prevent archival forgeries.
2. Background
CEA is a local government archive management system in Cilegon City, West Java, Indonesia. It provides the lifecycle of government archives, including their collection, retention, and retrieval. The users of this system consist of 130 archivists from 32 units in Cilegon City. All the users access this system via an internet connection. The CEA system uses centralized architecture, which means all users access one server in order to store applications, data, and uploaded files of digital archives.
Figure 2 illustrates the existing concept of the CEA system. The archives come from external systems. In every unit, there are several external systems, such as finance, human resource, and procurement. These systems produce printed documents as output. Based on the Indonesian government’s rules, the archivists in all units have to input the data of the printed-out documents. Sometimes the archivists scan and upload the papers into the CEA system. However, because scanners are not available in all units, sometimes archivists only input the data. When they do, they only re-input the data, which are already stored in the external systems. It causes data duplication and human error. For instance, typing mistakes results in mismatches between the data in the external systems and the data in the CEA system. This is a crucial error, and the BPK (Indonesian Government Auditor) must audit the archives periodically. The auditor will inspect the archives in the external systems and the CEA system.
The centralized web system uses location-based addressing. It uses the internet protocol (IP) addresses in the uniform resource locators (URLs) to show the location where the data are stored. It does not have a direct relationship with the content of the data. The existing CEA system works on hypertext transfer protocol (HTTP), which downloads files from a single host server in response to requests. Centralized data are easier to deliver and manage. However, it allows any party, such as server administrators and archivists, to access, modify, and even remove data. The problem with location-based addressing occurs when the location is not accessible. If the server is offline, users cannot retrieve the data.
Furthermore, the existing CEA system has challenges in three aspects of information security meant to ensure the protection of data [
4], as mentioned below:
Confidentiality
The data and files of archives are stored in a single server, which does not have secured protection, besides requiring login usernames and passwords. The server administrator can access the data and files without permission. There is no monitoring or logging system to control who accesses what.
Integrity
The CEA system allows users to modify the data of the archives. There is no mechanism to validate the data that are modified. Each archivist is responsible for all archives in his or her unit. However, no one can guarantee whether the modifications tamper with archival integrity. This condition creates vulnerability and can lead to archival forgeries.
Availability
Data availability is low in a centralized system. When the server is off, all processes stop working. The users cannot send requests to the system, and the system cannot respond to the requests or provide users with the data and information they need.
To overcome these challenges, this paper develops a decentralized CEA system by implementing a distributed file system to manage the archives. The files are distributed in several nodes, to address the issue of having a single point of failure problem. This system uses a smart contract to manage the functionality of the application and deploy it on an Ethereum private network. It will keep the archives confidential and safe from unauthorized users. The consensus mechanism in the blockchain validates each transaction before it is committed in a block to maintain archival integrity.
The blockchain network can assure the integrity of any information system [
5]. Running a peer-to-peer (P2P) protocol guarantees a solution to the problem of a single point of failure. Every transaction is stored in a block, which is linked and secured with cryptography. It also has a consensus mechanism to validate the transaction. These features make it almost impossible to modify a transaction. IPFS is a distributed file storage system that involves all nodes to store each file securely and provides high data availability. By combining blockchain and IPFS, the CEA system can improve system security and preserve the integrity of its archives.
The integrity of archives is essential to preventing archival forgeries. Blockchain is able to store data immutably, secure them with a digital signature, and timestamp them. Blockchain can be used to store less data, like transaction metadata information and hash values [
6]. This system uses Ethereum private blockchain to store the hash of the file. In addition, a consensus mechanism in the blockchain can validate the transaction securely. When there is any modification, the nodes in the blockchain will check whether it is valid or not.
3. System Design
This section describes the design of the proposed system, which consists of system architecture, scenarios, and file distribution.
3.1. System Architecture
Figure 3 illustrates the proposed CEA system architecture. To interact with this system, each user must install the decentralized application named CEA DApp, IPFS private network, and Ethereum private network named CEA Network. The users consist of archivists and BPK. One account is set as an administrator, and it is responsible for registering and monitoring other users. Each user has two identities. First, there is the peer ID for interacting with the IPFS private network. Second, there is the Ethereum address for communicating with CEA Network.
To maintain the confidentiality of archives, both the IPFS and CEA Network are set as private networks. The files will only be distributed between archivists and BPK. The hash of files will also be stored and retrieved only by them. The user must install Metamask [
14] to run CEA DApp on traditional browsers, such as Google Chrome or Firefox. In addition, Metamask has to import the Ethereum addresses, which are generated on the CEA Network. The CEA DApp has a smart contract to store the hash in CEA Network and retrieve the hash of files from the CEA Network.
3.2. System Scenarios
3.2.1. Registration on IPFS Private Network
Figure 4 illustrates the scenario of registering on the IPFS private network. An archivist is assigned as an administrator, who has access as Boot Node. All other users are granted access as Client Nodes. All nodes have to set a bootstrap node configuration, using a boot node peer ID and IP address. Hereafter, the boot node generates and copies a single swarm key for all client nodes. A swarm key allows each IPFS node to connect with other nodes. All nodes in the private network must have the exact same swarm key. The IPFS private network is restricted from the public IPFS network.
3.2.2. Registration on CEA Network
The CEA Network is an Ethereum private network. As seen in
Figure 5, an administrator should enter a passphrase, to generate an account, which is an externally owned account (EOA). Users need this account to interact with the Ethereum blockchain via transactions [
15]. Each EOA has a balance (Ether) to sign each transaction. The accounts are identified by a public and a private key, which are encoded in a key file on the KeyStore directory in each account. The public key is used as the account address.
3.2.3. Uploading a File on CEA System
Figure 6 illustrates the scenario of uploading a file on CEA system. Users employ the CEA DApp interface to upload a file. The file is stored as an object on a local storage IPFS client node. Then, the object is pinned recursively in each client’s local storage node to provide file availability. It pins all links to the object to be stored locally. After the object is successfully pinned in local storage, IPFS leverages DHT, to distribute the object to other nodes in the private network. In this distribution, the closest peer ID among the nodes has the highest priority. IPFS returns a hash to identify the file. The system calls Metamask to sign the transaction containing the hash of a file. However, the user must unlock the account to enable the signing of the transaction on CEA Network. When the transaction is committed, CEA Network returns a transaction ID as its identity.
3.2.4. Downloading a File on CEA System
Figure 7 illustrates how users can download a file on the CEA system. Users use a transaction ID to retrieve transaction data from CEA Network. The transaction data contain a hash of the file and other transaction information. The users use that hash to download a file from the IPFS private network.
3.3. File Distribution on IPFS
IPFS stores each file as an object that contains data and links. The object storing size is limited to 256 Kb. Objects that are larger than 256 Kb are divided into small parts (chunks) that are equal to or less than 256 Kb. Every chunk is identified by a hash and linked to other chunks, as illustrated in
Figure 8.
4. Implementation
This section explains the implementation of CEA Network, the IPFS private network, and CEA DApp.
4.1. CEA Network
To maintain file confidentiality in the file distribution system, this paper implements an Ethereum private blockchain named CEA Network. After installing Geth-Ethereum, we generated three Ethereum accounts on CEA Network. These accounts are protected with a passphrase, which is similar to a password. In addition, these accounts are used for deploying the smart contract and signing the transactions on the CEA Network.
After generating the accounts, the next step is creating a genesis block to provide Geth with basic information, such as chainID, difficulty, gasLimit, and alloc. The chainID is an identification number that distinguishes the CEA Network from other blockchains. The difficulty determines the difficulty of the blocks to be mined. A lower number expresses that the mining process will be quicker. In turn, the transactions will be faster. The gasLimit defines the limit of gas cost per block. This should be set high, in order to prevent limitations when testing a transaction. The alloc allocates the ETH balance to each account. The genesis block is a JSON file and will be copied to other nodes on CEA Network.
Before starting the CEA Network, the data dictionary has to be instantiated with the genesis file, to store data related to the CEA Network blockchain. In this case, the name is myDataDir. In order to use the DApp, HTTP JSON-RPC and flag --rpc are enabled when starting the CEA Network. This network runs at
http://localhost:8543. Metamask is used as a communication bridge between the DApp and the CEA Network by importing the existing accounts on the CEA Network into Metamask, using a private key that is stored on the UTC file in the Keystore directory.
4.2. IPFS Private Network
This research uses an IPFS private network to restrict file distribution. This implementation creates a boot node and client nodes. A boot node acts as a network administrator, which is responsible for monitoring the client nodes. It is also used by the client nodes to connect to the IPFS private network. Client nodes have access to upload or download files on the IPFS private network. The Rivest–Shamir–Adleman (RSA) algorithm is used to generate key public/private key pairs to authenticate peers in IPFS. Every node has a peer ID as its identity, which is generated during IPFS initialization.
There are two requirements to enable communication between peers on the IPFS private network. First, the default bootstrap node must be replaced with the IP address and peer ID of the boot node. This should be done for the boot node and the client nodes. Second, the boot node must install ipfs-swarm-key-gen to generate a swarm key. This swarm key must be available in the ~/.ipfs directory in the boot node and the client nodes.
Table 1 shows how the private network successfully starts the boot node and the client nodes. It shows that all nodes have a swarm key with the fingerprint afcc0321ceefb1bc0a8c13b187bc369b. This network is limited to peers with a fingerprint of the swarm key, as mentioned above. It ensures that public peers without the same swarm key cannot access the network. To enable the communication between IPFS private network and DApp, the requests from DApp at
http://localhost:3001 have to be allowed in the IPFS configuration. By default, IPFS is designed to reject request from unknown domains. Therefore, firstly the domains are configured in ‘Access-Control-Allow-Origin’, to enable the interaction with the IPFS private network.
4.3. CEA-DApp
This study developed a CEA-DApp as a front-end system, to make it more convenient for the user to interact with the IPFS private network and CEA Network. It was built on the Truffle Suite framework, using ReactJS programming. This DApp has a smart contract named CEAContract.sol, to store the hash of a file. The CEAContract.sol contains a state variable storedData typed string to model the hash of a file from IPFS. It also contains set and get functions. The set function receives an input value and stores it in storedData, while the get function displays the storedData value.
Before deploying a smart contract in CEA Network, generated accounts from CEA Network are import into Metamask and unlock them.
Figure 9 shows CEAContract.sol deployment on the CEA Network. It is stored as a transaction on the distributed ledger and identified by its transaction hash. After it is deployed, the smart contract will get a contract address as its identity and an application binary interface (ABI) as a description of the deployed contract and its function [
16]. As seen in
Table 2, the contract address and ABI are used in a web3.js library, to allow the interaction between the ReactJS program and the smart contract.
The CEA DApp interacts with CEA Network, which is run on
http://localhost:8543, and accesses the IPFS private network API using an ipfs-http-client library, which runs on IPFS daemon at port 5001. Therefore, CEA DApp can upload and download files on IPFS. The front-end consists of the upload and download page. A class-named app is used on the upload page. It contains a state as an object that holds information, like ipfsHash, buffer, and transactionHash.
As seen in
Table 3, the system calls function onSubmit to store the file on the IPFS private network when uploading the file. The file is converted into a buffer (an array with data) before it is uploaded into the IPFS private network. The IPFS private network returns a hash of the file. Then it triggers the Metamask to confirm the transaction. After the transaction is successfully confirmed, the CEA Network start the mining process to put the transaction in a block. Then, the CEA Network will return a Transaction ID. Before doing this transaction, the account in the CEA Network must be unlocked.
5. Experiment
This section discusses the experiment that was conducted on the proposed CEA system, to ensure its proper functionality and evaluate the transaction fee and system performance.
5.1. Testing of File Availability
This experiment involved three nodes in the IPFS private network that are run on a similar WIFI network. One node is set as a boot node to monitor the client node’s activity. Two nodes are set as client nodes.
Table 4 shows the hardware and software specifications for testing the nodes.
All nodes identify each other by using their peer IDs, which are unique identities for each node with which to communicate in the IPFS private network.
Table 5 shows the peer ID of each node in this experiment.
Next, the upload and download operations were tested on the IPFS private network. For the upload operation, IPFS creates an immutable object, depending on file size. If the file is larger than 256 kB, it is divided into several chunks that are identified by their CID. First, the data are stored on the local repository of Client Node 1. Then, Client Node 1 updates the DHT to distribute the data. In this experiment, Client Node 1 uploaded files with various sizes.
Figure 10a shows that the file distribution with size 100 kB is stored only in one object, which is identified by one CID.
Figure 10b shows that the file distribution with size 1 MB is stored in four objects. It has a root CID and CID for each part.
To check file availability, we conducted an experiment where we tried to upload a file into Client_1 node and download it from Client_2 node while the node of Client_1 was down. The result shows that the Client_2 node can properly download the file even when the node contains the file (Client_1 node) is not available as other nodes in the network provide it,
Table 6 presents the experiment details.
To compare the proposed CEA system with the existing one, we attempted to download a file when it was not available on the existing system. As seen in
Figure 11, the existing CEA system stores the file on the central server and uses location-based addressing; therefore, the file can be downloaded only if it is available on that server or location.
These experimental results show that the functionality of the proposed CEA system offers improved file availability. A distributed file system can provide files, because, instead of storing files on a centralized server, it stores them on all nodes in the IPFS private network. When one node is turned off, other nodes can provide the files.
5.2. Testing of File Integrity
Figure 12 presents an example of how to upload a file by using the CEA DApp. When the user uploads a file, CEA DApp automatically detects its Ethereum Address. Then, the CEA DApp interacts with the IPFS private network, to upload the file. First, the file is stored in a local repository. Then it is distributed to other nodes in the network. IPFS returns a hash of the file, and CEA DApp triggers Metamask to confirm the transaction to be stored on the CEA Network.
A transaction ID is used to retrieve data from the CEA Network.
Figure 13 shows the results on the download page. It returns the block number, gas used, timestamp, input, and decoded input. The input contains an encoded hash of a file in Hex data format, with a length of 132 bytes. It should be decoded into a human-readable format. The hash will redirect to the IPFS gateway, on
http://127.0.0.1:8080/ipfs/, so that the user can download the file without interacting with IPFS command-line interface (CLI).
To ensure collaboration between CEA DApp and CEA Network or the IPFS private network, this experiment manually checks the IPFS and Ethereum console.
Figure 14 shows the transaction that is stored on CEA Network, using the transaction ID. All transactions on CEA Network are immutable. It helps users to prove the hash of files that are stored on the IPFS private network. Moreover, the transactions will not spread into another Ethereum network, because the CEA Network is private.
Figure 15 shows that a file is stored on the IPFS private network, using a hash. Because the file size is 1 MB, it is divided into 4 small parts, each of which is less than or equal to 256 kB.
IPFS generates a hash to identify the content of a file. Even though files have similar names and sizes, if their content differs slightly, the files will have totally different hashes. This point is shown in
Figure 16.
This experiment uploads two files with similar names and sizes but with very small modifications in their content. The IPFS identifies them with different hashes. This feature of IPFS can prove the integrity of each file. It can help users to check whether files are being tampered with.
5.3. Testing of File Confidentiality
In this paper, the IPFS private network restricts file distribution to nodes with the same swarm key. Checking confidentiality can be achieved by trying to download a file from a public IPFS gateway and from another IPFS client node outside the private network. In this case, a swarm key in Client Node 1 is removed, so Client Node 1 is no longer registered in the IPFS private network. After Client Node 2 uploads a file, Client Node 1 and the public IPFS gateway try to download and read the content of the file. As seen in
Figure 17, neither Client Node 1 nor the public IPFS gateway can download and read the content of the file. The IPFS network sets the local gateway inside the private network.
The existing CEA system already has user access control, limiting file access to each user’s respective unit. However, users can access files from another unit by suggesting archive ID on the URL, such as
http://202.XXX.XXX.XXX/XXX/arsip/edit/ArchiveID. The ID is not encrypted. Moreover, user access control is only implemented in the CEA application. The server administrator can download files directly to the server, without logging into the application.
The testing results show that the IPFS private network in the proposed system maintains the confidentiality of files. The files are only distributed to registered nodes that have the same swarm key for the private network. In addition, the server administrator cannot download files directly into the local node. The files are split, and no single node stores an entire file.
5.4. Transaction Fee (ETH) Analysis
This experiment uses several sizes, including 100 kB, 1 MB, and 10 MB, with the gas price set at 3 GWEI, 10 GWEI, and 18 GWEI, respectively. These gas prices determine how quickly a transaction will be mined.
Figure 18 shows that the number of gas prices is linear with the transaction fee. The smaller the gas price, the lower the transaction fee. It does not matter how big the file size is.
Currently, the average daily upload to the CEA system is 54 transactions in each unit. Because the balance in the Ethereum private network is customizable, this experiment chooses 18 GWEI with a transaction fee of 0.00074 ETH. The estimation of the yearly transaction fee in daily work is 12,960 transactions * 0.00074 ETH, which is equal to 9.5904 ETH, which is equal to 1787.27 USD. According to Indonesian Government Regulation Number 15 [
17], in 2018, the monthly fee for bandwidth 100 Mbps in BPPT was 101,690,000 IDR, or equal to 7222.15 USD. It is 86,665.8 USD per year. Using the proposed system is more cost-effective than using the existing system. In addition, the transaction fee in the proposed system is only charged if users upload or download a file in the CEA system.
5.5. System Performance Analysis
In this section, the measurements focus on latency, bandwidth usage, and throughput. Latency is measured by sending ten packets in ping command. As seen in
Figure 19, the round-trip time (RTT) in the existing CEA system is consistent. The RTT is between 9 and 15 ms, with an average latency of 11 ms. In the proposed CEA system, the RTT fluctuates. Because of network congestion, the sixth response goes up to 12 ms. Then, it drops to 2 ms. However, the average latency is 3 ms, which is less than the existing CEA system. Transferring data by using the proposed CEA system is faster than doing so in the existing CEA system.
The measurement of bandwidth usage refers to the upload and download operations in the proposed CEA system. We use the file sizes of 100 kB, 1 MB, and 10 MB, multiplying by 54 as the average number of transactions in the existing CEA system. As seen in
Figure 20, the upload operation consumes more bandwidth than does the download operation. The highest bandwidth usage for uploading 540 MB is 0.084 kB/s, whereas downloading 540 MB is only 0.024 kB/s. Neither uploading nor downloading consumes much bandwidth.
Furthermore, this paper measures the bandwidth capacity of the proposed CEA system, using speedtest-cli [
18]. The proposed CEA system handles 16.64 Mbits/s for downloading and 11.20 Mbits/s for uploading. In addition, iPerf [
19] is used to measure the approximate packet size that can be transferred. It shows that the proposed CEA system can transfer a 25.7 Mbyte packet size with an average of 15.5 Mbits/s during 13.9 s. The throughput supports the proposed CEA system for uploading and downloading a maximum packet size of 540 MB.
6. Conclusions
This paper introduced a prototype for the CEA system that integrates an IPFS private network, Ethereum private network (CEA Network), and Decentralized CEA (CEA DApp). The results of the functionality evaluation show files are well distributed among the nodes in the IPFS private network, where all nodes participate in providing data. CEA DApp offers a solution to the single point of failure issue and improves data availability. The nodes without the right swarm key cannot connect to the IPFS private network. This feature ensures data confidentiality, because the files cannot be distributed to the IPFS public network.
Moreover, the hashes of files are stored as transactions on CEA Network. The transactions are unchangeable, digitally signed with Ethereum accounts, timestamped, and knowable by all participants on CEA Network.
The transaction fee estimation shows that the proposed CEA system is more cost-effective than the existing system. The performance evaluation shows that the proposed CEA system has low latency and does not consume much bandwidth. In addition, the throughput results support the bandwidth usage for the maximum transaction estimation.
This system is still a prototype, and improvements are needed. In the future, the scheme of this system needs to be enhanced by implementing archives’ lifecycle and involving an off-chain, e-Archive database to store user accounts and transaction logs.