1. Introduction
Industry 4.0 covers a wide range of modern approaches and technologies that aim to improve the manufacturing industry greatly. The most relevant technologies that are included in the concept of Industry 4.0 are Big Data, Artificial Intelligence (AI), advanced robotics, edge computing, 5G networks, the Internet of Things (IoT), and overall digitalization of the manufacturing processes [
1].
Recently, blockchains have started to become increasingly relevant in the field of Industry 4.0 due to their ability to provide immutability and traceability of stored data. Thus, data can be processed throughout DLTs in a decentralized and trustworthy manner [
2]. However, the usage of blockchain in Industry 4.0, where there are many resource-constrained devices, is not straightforward since blockchains require high storage capacity, high computational power, and offer relatively low throughput [
3]. However, recently, many researchers have designed lightweight blockchains for IoT [
4,
5]. Furthermore, recently, novel Distributed Ledger Technologies (DLTs) such as Directed Acyclic Graphs (DAGs) [
6], which are specifically optimized for resource-constrained environments, have been introduced.
A DLT-based Industry 4.0 scenario is presented in [
7], where there is a broad ecosystem of inter-connected smart plant clusters.
Figure 1 shows an Industry 4.0 scenario with two industrial plants, where each plant has several production lines (e.g., Plant A has production lines “PL1-A1” and “PL2-A2”). Within each production line, several Industrial IoT (IIoT) devices operate and generate raw data that measures, among other data, machines’ performance, process productivity, and quality, and machines’ End-Of-Life. This production data is securely stored by means of DAG-type DLTs, which according to Wu et al. [
8], are the most appropriate DLTs for IIoT devices due to their high throughput. As IIoT devices normally generate a high amount of data to reduce the storage burden of the DAGs, it is advisable to leverage appropriate storage without overloading the DLT. For example, in this scenario, we have an InterPlanetary File System (IPFS) (
https://ipfs.io/, accessed on 20 November 2022) storage system, where the actual IIoT raw data is stored, whilst the DAGs store only the data hashes. This approach assures data integrity whilst keeping the DAG lightweight.
So far, in the presented scenario, the data generated by IIoT devices is securely gathered and stored by the DAG DLTs and IPFS storage. However, Industry 4.0 does not stop at the machine data level, and this data that is being gathered at “the lowest level” needs to be exploited and processed by “higher” levels to derive and build actual information, such as machine and IIoT fleet status, machines predictive maintenance (by AI algorithms), compute overall process productivity, etc. Hence, these “upper” processes need to access and process heterogeneous data from all cluster plants. However, accessing and processing all the raw machine data from all production-level DAG-type DLTs is not a straightforward procedure, essentially due to the:
Heterogeneous machine data. Data could be expressed in different units of measure depending on the machine provider, machine version, country, etc. They could have a distinct number of decimal places, obey different standards, or they can include certain errors or variations. This problem stems from the fact that according to Jirko et al. [
9],
“machines within a complex system are produced by different manufacturers with different data models and interfaces’’. Consequently, this issue affects industrial interoperability and integration, thus, creating a detrimental impact on the ability to effectively process data using disruptive technologies, such as Big Data or AI.
Lack of efficiency and security when accessing machine data [
10]. It is not efficient nor secure to directly delegate the responsibility to external data exploitation services to access and process the raw machine data into “readable” plant-level data. Accessing machine data means that each data exploitation service needs to be a client of every production line DLT that wants to access data from. Additionally, these services would need to simultaneously process all the data from all machines and homogenize it accordingly. This approach lacks efficiency as the data exploitation services would spend a high amount of time accessing and homogenizing data before exploiting it. This is not utterly secure either since it breaks the data custody chain and mixes responsibilities, as in each data exploitation service, the actual data format being used to be exploited becomes obscure, and the traceability and integrity are compromised.
Furthermore, such complex ecosystems require proper monitoring to achieve higher efficiency rates by notifying human operators of probable performance gaps and possible disruptions through the presentation of data. Even though, theoretically, the use of DLTs improves the security of industrial processes, many attacks, such as Denial-of-Service (DoS) attacks, are still possible. Thus, they need to be identified and mitigated as soon as possible to avoid the disruption of industrial production. In addition, proper monitoring can also help mitigate errors, and optimize production processes and associated costs, which are known to be critical in industry [
11].
In this context, this work aims to mitigate these problems, by providing means to that:
The raw machine data that resides in DAG-type DLTs are securely and consistently homogenized by a secured and traceable process. This will ensure that the data conforms to a common data model, thus, providing interoperability so that processes at higher levels can exploit the data in a consistent manner.
The homogenized data is securely stored and accessed, ensuring its integrity and availability. This will ensure trust in the data throughout the whole process, from where the data is generated from the production lines to where it is exploited and processed at a higher level. Processing raw IIoT data through a DAG DLT is a pointless approach if, at a higher level, we have a centralized and non-persistent data structure where the data can be easily tampered with [
12].
The whole industrial architecture must be carefully monitored using a monitoring system that is able to analyze all components securely; the IIoT sensors and actuators, the DLTs, the storage systems, etc. This analysis is required for performance and security optimizations and prevention to avoid the malfunction of critical processes.
Consequently, this work presents the following contributions to the described issues:
A “data homogenization” process for solving data interoperability issues that relies on the use of decentralized blockchain oracles as a trustworthy source for the target data model scheme the data needs to conform to. In this paper, we greatly improve the oracle architecture compared to a previous work [
13] by employing a more versatile blockchain platform to improve simplicity and provide more interoperability capabilities. Finally, we store the resulting homogenized data in a blockchain-based solution for trustworthy access and processing.
A monitoring system for the proposed scheme to track the quality of the retrieved data, the performance of the network, the usage of each oracle, billing reports, security incidents, etc. We implement a monitoring architecture API for data retrieval, and we visualize it using the ELK (
https://www.elastic.co/es/what-is/elk-stack, accessed on 21 November 2022) (Elasticsearch, Logstash, and Kibana) stack.
A prototype that implements the secure data homogenization process that: (i) accesses raw machine data stored in DAG DLTs, (ii) gets the target data model schema from the oracles, (iii) performs the data homogenization from the source data scheme to the target data schema, and (iv) stores the homogenized data into a “plant level” blockchain network so that it can be consistently accessed and processed by other services. We also implement the monitoring system of the aforementioned scheme.
The remainder of the paper is organized as follows. In
Section 2, we introduce the concepts of blockchain, smart contracts, and oracles to provide the background technologies based on which our proposal is made. In
Section 3, we analyze the existing related work in this field and outline our contributions. In
Section 4, we describe our proposed solution for solving the Industry 4.0 data interoperability and security challenges. In
Section 5, we present the prototype of our solution. In
Section 6, we discuss the results of the proposed solution and analyze the employed technologies. Finally,
Section 7 includes the conclusion of the paper and future work insights.
3. Related Work
In this section, we first analyze the most relevant works that are related to interoperable DLT networks and smart oracles in IoT and industrial environments. We also conduct a comparative study between the presented work and the related works regarding several characteristics. Finally, we also analyze the existing monitoring proposals in the industry and compare them with our monitoring approach.
3.1. Interoperable Blockchains and Oracle Services for Industry 4.0
P. Bellavista et al. [
32] design a relay architecture based on Trusted Execution Environment (TEE) with the aim of providing trustworthy interoperability between blockchain networks in industrial environments. The authors claim that in an industry 4.0 ecosystem, it is impossible to have only one blockchain. The proposed solution makes use of an off-chain secure computation environment that is invoked by smart contracts. Nonetheless, this interoperability approach is achieved at high-performance costs and offers low scalability. Moreover, this solution relays on specific off-chain hardware equipment that might have vulnerabilities. Moreover, this solution only supports blockchains; thus, other solutions, such as DAGs, which are much more efficient in IoT environments, were not taken into account. Finally, the authors do not consider that industrial data might be heterogeneous, and they do not take into account secure methods of introducing external data in smart contracts.
Scheid et al. [
33] present Bifröst, a modular blockchain interoperability API that acts like a notary agent. This API is currently available for seven blockchains. Thus, it has to be specifically adapted to each blockchain solution. However, this proposal incurs high latency to the network and has several critical security issues that might have no feasible solution. Furthermore, this API does not assure secure external data entry in blockchain smart contracts.
Y. Jiang et al. [
34] aims to integrate DAG-type DLTs with a consortium blockchain by using sidechains. The consortium blockchain is used as the “main” chain to which several DAG sidechains are connected. To achieve interoperability, there are several notaries nodes that act like gateways between the main chain and its sidechains. This proposal might be useful to set an industrial scenario where IIoT data is processed by the DAGs at the data source level, and the consortium main chain is used to unify and exploit the data at a higher level. However, the proposed architecture adds a high grade of complexity, energy consumption, and latency to the network, since it employs the PoW consensus mechanism to guarantee decentralization and avoid the issues of the solution presented by E. Scheid et al. [
33]. Furthermore, in this work, the heterogeneity of industrial data is not taken into account, and the main chain has no clear purpose.
Gao et al. [
35] design a data exchange scheme by using an oracle service node that acts like a trusted notary between two or more blockchains. They also design a secure data migration protocol based on asymmetric encryption between the blockchains to avoid man-in-the-middle attacks. They also suggest novel methods of making the proposed scheme more applicable to real-world scenarios. However, this proposal creates a single point of failure in the network, thus making the use of blockchain pointless.
Wiraatmaja et al. [
36] propose a custom-made oracle framework in JavaScript to enable safe data transactions between decentralized DLTs such as IOTA and Ethereum, and other decentralized solutions such as IPFS. However, similar to the work presented by Gao et al. [
35], this architecture uses centralized notaries that create a single point of failure.
Unlike the works that have been described above, we design an efficient and trustworthy industrial scheme based on blockchain where we aim to achieve data integrity and interoperability throughout the whole process from where data is generated up until it is standardized and managed at a higher level. We start from a scenario where raw IIoT data is processed by efficient production line DAG DLTs and design a data homogenization process using decentralized oracles along with a monitoring interface for improved data analytics. Finally, we store the resulting homogenized data in an interoperable plant blockchain for trustworthy data management and processing of the homogenized IIoT data originating in many production lines.
Table 1 shows a comparison between the related works and the presented work. We compare six characteristics: the used approach to achieve interoperability: trusted hardware, notary [
18], or sidechains [
37]) (Approach) if the proposal includes oracles (Oracles) if the solution is completely decentralized (Decentralized) if it does not incur a significant burden (Efficient) if it guarantees the integrity of the data throughout the whole process (Data Integrity) and the types of DLTs that it supports (Support). In our case, we employ a notary scheme approach to exchange data between distinct DLTs. We use oracles to provide trustworthy external data within the homogenization process. Our solution is completely decentralized throughout the whole process, and it does not incur a significant burden in any phase, thus, is efficient. Finally, it guarantees data integrity throughout the whole process and supports blockchain and DAG DLTs.
3.2. Modern Industry Monitoring
Industrial monitoring is a broad area with many relevant works [
38,
39]. However, in this subsection, we focus on relevant monitoring schemes that are relatively recent and include some degree of technological application (i.e., software, IoT, wireless networking, etc.), since our work is included within the modern framework of Industry 4.0. We mainly aim to analyze and compare the coverage depth of the monitoring systems within the scenario for which they are used.
Shi and Gindy [
40] present an automatic software-based monitoring architecture that is capable of performing automatic online acquisition, presentation, and analysis of sensor signals. This monitoring system is able to acquire, analyze, and present the data simultaneously and automatically by using a multi-thread programming approach. The software was developed to function in a retriggerable manner so it can register signals successively without manual interference.
The work presented by Sung and Hsu [
41] employs ZigBee [
42] wireless transmission technology in combination with embedded hardware to perform comprehensive remote monitoring of the industrial equipment. This proposal is intended to improve the safety and efficiency of industrial environments by measuring critical aspects such as energy consumption, temperature, or CO
2 levels.
Zhao et al. [
43] design a modern monitoring system for IIoT environments. This proposal intends to provide real-time monitoring to improve technical and financial matters within industrial companies. Field-programmable Gate Arrays (FPGAs) are used in this work due to their high reliability and processing speed. Finally, a developed IoT platform provides remote real-time visualization.
A recent monitoring system is presented by W. Chen [
44]. This paper presents a reference architecture for IoT data monitoring and designs a theoretical model of the system. The authors also address several issues that can be found in modern manufacturing environments, such as the large amounts of data that has to be processed, the integration of key technologies such as Wireless Sensors Network (WSN) [
45] or Radio-Frequency Identification (RFID) [
46], and the correlation between data.
Magadán et al. [
47] design a low-cost scheme for real-time monitoring of electric motors. The developed module gathers real-time information on the vibrations and temperature of the electric motors and stores it in a lightweight IoT analytic platform. The information is further processed and analyzed to provide operating reports and improvement suggestions. Using this proposal, several anomalies of electric motors have been successfully identified and mitigated. Furthermore, relevant predictive maintenance reports have been generated. The authors also intend to use machine learning to predict better and mitigate failures.
Mourtzis et al. [
48] design a monitoring system based on an augmented reality mobile application tool for real-time machine monitoring and maintenance. The system includes all the required connections from the sensors to enable precise monitoring of the remaining operating lapse, plan maintenance tasks on the available time slots, update the machine schedule based on the length of the maintenance and connect to a remote database. Moreover, to improve the maintenance instructions and secure the generated result, the maintenance technician is aided by a set of functionalities, such as an algorithm that breaks down the assembly tasks and pre-creates the graphical interface. The proposed system increases interoperability, efficiency, and communication, providing useful data that can be further analyzed and transformed.
The main difference between the presented works and our own is that the purpose of our work is not to provide a novel approach to industrial monitoring. The main goal is to provide a broader approach that is adapted to the disruptive technologies that are being applied in this paper. In our case, apart from monitoring IIoT devices as is already done in other works, we intend to monitor other technologies such as IPFS, DLTs, and blockchain oracles. Furthermore, our measurements preserve data privacy since we employ a zero-trust approach and thus do not capture actual data. The necessity of this approach is to assure the maximum efficiency of the industrial processes, including the present IT technologies, and the maximum reduction of costs as well as cyber-attacks defense. As far as we know, there is no monitoring system that covers the monitoring of all technologies involved in an Industry 4.0 scenario.
Table 2 shows a comparison between the characteristics of the related works regarding monitoring systems and the one that we design in this work. We analyze the transmission technologies that each proposal is based on (Technology), if it covers the monitoring of IIoT devices (Covers IIoT) if it is based on some kind of software program (Software-based), if it covers the monitoring of disruptive Industry 4.0 technologies such as blockchain, AI, Edge Computing, etc. (Monitoring I4.0 Technologies) and if it assures the privacy of the data during the monitoring process (Data Privacy).
4. Interoperable Plant Blockchain for Homogenized Data via Smart Oracles
In this section, we describe the proposed solution for machine data interoperability and trustworthy storage of plant-level data. First, we describe the proposed data homogenization process using smart oracles, and then we present the design of a monitoring scheme for the proposed architecture.
Figure 3 depicts the proposed solution, in orange, on top of the motivating industrial scenario that was presented above. Specifically, in grey, we have N smart factories where the IIoT data is processed using DAG DLTs along with IPFS decentralized storage. Additionally, in orange, we have the proposed extension that we address in this work. We added a data homogenization service that makes use of blockchain oracles and has the resulting data stored in an interoperable external blockchain. On top of the scheme, we also have a monitoring system for the whole architecture.
4.1. Data Homogenization via Decentralized Oracles
As mentioned in the motivating scenario, the actual IIoT data is stored inside an IPFS storage system, while the data-source DAGs would only store the hashes to reduce the storage burden of the DLTs. In the proposed scheme, after receiving and storing the raw IIoT data hashes from IPFS, a data homogenization service that is executed periodically would make a call to an external decentralized oracle service to retrieve the data model used for the data homogenization process. Blockchain oracles are needed since smart contracts are unable to access external data sources in a trustworthy manner. Hereafter, once the data model is received from the oracle, the data homogenization process starts its execution. The homogenization process consists of converting raw IIoT data into a standardized data scheme according to the given data model. Finally, the data homogenization service would then send the homogenized data to an interoperable plant blockchain, which in turn stores it inside the IPFS storage system and keeps its references within the immutable ledger.
Figure 4 depicts the sequence diagram of the presented homogenization process.
Therefore, the main purpose of the interoperable plant blockchain is to store and manage the smart plant securely homogenized data references and provide access control to IPFS. This blockchain would also unify the data management of different industrial plants belonging to the same business conglomerate. Finally, this ledger would act as a bridge between the DAG DLTs that process the data from IIoT devices inside production lines, and other hypothetical DLT connections with other organizations within a hypothetical decentralized business consortium network.
Consequently, interoperability capabilities are required at this level. To connect the production lines DAGs and the plant blockchain, we propose to make use of a smart contract-based notary scheme that interacts with a smart contract from the destination blockchain to transfer the data securely.
Application Example
One real-world example application of the approach described could be a system for collecting and storing data from sensors in an industrial plant. In this system, the raw data from the sensors would be stored in IPFS, and the hashes of this data would be recorded in a DAG DLT. The data homogenization service would periodically retrieve a data model from a decentralized oracle service, and use this model to convert the raw sensor data into a standardized format. The homogenized data would then be stored in IPFS and recorded in the interoperable plant blockchain.
This system could be used to ensure the integrity and traceability of the sensor data, as the data would be stored in a decentralized and immutable manner. It could also help to facilitate data interoperability, as the standardized data format would make it easier for different systems and applications to make use of the data. Additionally, the use of oracles to retrieve the data model from an external source could allow the data homogenization process to be updated and improved over time, as the oracle could provide access to the most recent data model. Finally, this approach also enables the data homogenization process to be updated and improved over time.
4.2. Monitoring System Architecture
The purpose of the proposed monitoring system is to visualize and analyze the industrial data throughout the whole process, since it is generated at an IIoT level up until it is homogenized and exploited at a plant level, along with all the elements that intervene in the aforementioned process. These elements go from the IIoT devices to the DLTs, and IPFS storage until the blockchain oracles. A monitoring scheme covering all the elements apart from the IIoT devices is required to check the quality and integrity of the retrieved data, the status and usage of each element, accrued financial costs, and other financial information for future business-related use cases. Furthermore, in modern Industry 4.0, strict monitoring is also required so cyber-attacks or performance issues can be rapidly identified and mitigated. For example, monitoring the number of active devices, their effectiveness, or temperature can provide a holistic picture of the overall productivity and weaknesses of the plant. Monitoring of IT elements such as blockchains and oracles could help us identify performance bottlenecks, vulnerabilities, and cyber-attacks, and optimize the IT infrastructure associated costs [
49].
To make the monitoring system as efficient and secure as possible, we followed three guidelines when designing it [
50]: (i) the collection of metrics should not have a significant impact on the performance of the employed DLTs or on the data homogenization process, nor it should create a massive data traffic overhead; (ii) it should be as modular as possible to support different DLTs and oracle services; and lastly, (iii) the defined metrics should be defined to cover multiple industrial scenarios.
The proposed monitoring system consists of five modules: (1) IIoT data monitoring agent; (2) storage monitoring agent; (3) oracles monitoring agent; (4) the DLTs monitoring agent; and (5) the monitoring system core.
Figure 5 shows the architecture of the monitoring system.
Effective monitoring requires strategic placement of measurement probes, without affecting in any manner the flow of the data and thus causing more latency and overall poorer performance. Furthermore, the monitoring system must be designed in such a way so the data cannot be fraudulently accessed and tampered with through it. Consequently, similarly to other works such as [
43], we propose the use of cheap lightweight FPGA devices with limited access to the actual data for the monitoring tasks. Thus, apart from avoiding illegal access to the data, using cheap devices avoids a significant increase in the operating costs of the architecture.
Figure 6 shows the monitoring probes placement process across the presented architecture.
The monitoring process is composed of the following four steps:
- 1.
First, we need to place a monitoring probe at the IIoT level so the original raw data can be monitored at the exact source before being stored or processed by any other agent.
- 2.
The second step is to monitor the data when it arrives at the IPFS-DAG tandem. The comparison between the data that comes from the IIoT devices with the data that is finally stored and processed in IPFS and the DAG can help identify possible man-in-the-middle and DDoS attacks or mere transmission failures. Apart from monitoring the data, we can also monitor performance and other status data from the IPFS and DAG structures.
- 3.
The third step is to monitor the data homogenization process, along with the employed oracles, so we can ensure that the process has been correctly executed. Regarding the oracle scheme, we can comprehensively examine the usage of the oracles and possible incurred costs, as well as possible performance and security issues.
- 4.
Finally, the last probes would monitor the homogenized data at the plant level structures; the interoperable plant blockchain, and the related IPFS partition. Monitoring this part of the architecture helps us ensure that the homogenized data has been correctly stored and processed. We also need to make sure that there are no performance or security issues that can compromise the data prior to exploitation for business processes.
5. Implementation
In this section, we describe the implementation process of the prototype that we have developed to prove the viability of our proposal.
5.1. Data Homogenization Process with Decentralized Oracles
The machine data that we employ in this prototype is based on a real-world JSON structure that was obtained from actual industrial sensors. The IIoT devices from the simulated scenario collect data on the performance of the production line, the quality of the products being produced, timestamp data, diagnostics, and many other factors. These data can be used to optimize the production process and improve efficiency. When implementing the prototype, we simulate heterogeneous data similar to a real-world environment that was previously described in
Section 1. Thus, this implementation aims to solve the challenges related to the security, integrity, and heterogeneity of industrial data.
Specifically, we simulate the following Industry 4.0 IIoT equipment:
Smart sensors: These sensors can collect and transmit data about the performance and operation of machines, processes, and systems in real time.
Predictive maintenance systems: These systems use machine learning and data analytics to predict when maintenance is needed, helping to reduce downtime and improve efficiency.
Robotic systems: These systems can automate tasks such as material handling, assembly, and inspection, helping to increase productivity and reduce the need for manual labor.
We use IOTA as the production line DAGs to process the raw data since IOTA is currently known to be the most advanced DAG DLT solution [
2], especially in terms of performance. As for the oracle service, there are many relevant options from which we can choose. As mentioned before, the most well-known oracle platform is ChainLink, which is focused on deploying Ethereum-compatible oracles.
However, in this work, we are not making use of the Ethereum blockchain since it lacks interoperability capabilities, along with low-performance capabilities. Furthermore, to provide interoperability, we have chosen Polkadot as our oracle service, as well as the blockchain solution in which we will store the homogenized data. In this case, we have implemented a relay chain in which the homogenized data is stored, along with a parachain that acts as an oracle service.
This implementation leaves the possibility of extending the functionality of our architecture by connecting other parachains in the future, which for example, could carry out the execution of smart contracts that could establish business relationships with other entities (i.e., suppliers, customers, etc.).
Finally, we use the JSON-based Eclipse Unide data model, as shown in Listing 1. The Unide data model is specifically designed for manufacturing processes, and it is trusted by several major parties, such as SAP or Bosch.
Listing 1. Eclipse Unide data model |
|
First, we have implemented a NodeJS client that emulates several industrial devices and periodically sends industrial raw data to an IPFS file system. Then the resulting IPFS hash is sent to the IOTA DAG DLT. Afterward, we implemented the data homogenization client in NodeJS. This client performs the following sequence of six tasks:
Access the IPFS raw data using the hash that is stored in the production line IOTA DAG DLT. An example of an industrial raw data JSON is shown in Listing 2.
Request the oracle service to retrieve the data model.
Figure 7 shows the retrieval of the data model by the Polkadot parachain that we set as the oracles service.
Perform the data homogenization process. We defined the mapping between the raw data schema to the standard Eclipse Unide data model schema using the
jsonpath-object-transform (
https://www.npmjs.com/package/jsonpath-object-transform, accessed on 25 November 2022) NPM package. Listing 3 shows the NodeJS code of the transformation process of the data according to the Unide model.
To assure that the process was correctly executed, we validate the resulting JSON using the Ajv JSON schema validator (
https://ajv.js.org/, accessed on 25 November 2022).
Add the used data model and the resulting homogenized data JSON to IPFS. An example of the homogenized raw data from Listing 2 is shown in Listing 4.
Send a transaction to the Polkadot relay chain (interoperable plant blockchain) to store the IPFS hash of the homogenized data.
Figure 8 shows the stores IPFS hash pointer of the homogenized data within the Polkadot blockchain.
Listing 2. Raw industrial data JSON example |
|
Listing 3. Data transformation in NodeJS code |
|
Listing 4. Homogenized industrial data JSON according to the Eclipse Unide model |
|
5.2. Monitoring System
In this subsection, we present the implementation of the monitoring system that we designed for the proposed architecture. In this implementation, we use NodeJS and ExpressJS for data retrieval to provide compatibility with the rest of the architecture. For this preliminary version, we create an API that includes information on the four modules that we explain in
Section 4.2. To properly show the monitoring data, we employ the ELK Stack. The aforementioned tools enable advanced real-time data visualization and monitoring with an easy-to-use dashboard. Thus, we do not need to create a dashboard from scratch, which would be a highly complex process. The ELK stack has been proven to be an ideal solution for our needs, as shown in other relevant works [
51].
In the implemented API modules, we show the following information:
The IIoT data monitoring. This module shows several metrics that are related to the raw data that comes from industrial machines. We set the monitoring probes directly at the sensor level when the data is generated. We measure the total number of devices within the industrial plant, the number of active devices, the percentage of active devices, the number of sent messages (i.e., raw data transactions), the data generation rate, and the average temperature of the devices. Listing 5 shows an example of the returned IIoT metrics from the monitoring API.
The DLTs monitoring. This module shows several metrics that are related to IOTA (production line DLT) and Polkadot (interoperable plant blockchain). It shows the overall throughput of each DLT, the transaction validation times, the associated costs (if any), information about the peer nodes, the consensus model, throughput, number of blocks, smart contract information (if any), etc. Listing 6 shows a trimmed example of the returned plant blockchain metrics from the monitoring API.
The oracles monitoring. This module shows several metrics that are related to the oracles. It shows which oracles have been used the most, which are currently available, the throughput capacity, the accumulated usage fees, the latest retrieved data, the quality of the data, etc. The “quality of data” metric shows whether the retrieved data model JSON is valid or not. Listing 7 shows a trimmed example of the returned blockchain oracles metrics from the monitoring API.
The storage monitoring. This module shows several metrics that are related to the storage of the data within the IPFS file system, such as performance, storage usage, peer nodes information, the generated hashes, version, IP addresses, etc. Listing 8 shows a trimmed example of the returned IPFS storage metrics from the monitoring API.
5.3. Results
In this subsection, we show the gathered results from the monitoring system based on a test run of the data homogenization architecture over several days. However, after running the process for several days, we observed that a 12 to 14 h simulation generates sufficiently robust and realistic results. Thus, we did not observe major variations in longer simulations. The simulated smart factory includes a total number of 500 IIoT devices that send random data at a random rate using the IoT-sim package (
https://www.npmjs.com/package/iot-sim, accessed on 28 November 2022). During the tests, the number of active IIoT devices varies randomly to simulate a realistic scenario.
The raw data is processed by IOTA and IPFS at the production line level, and then it is homogenized and processed by a Polkadot plant blockchain. We also set a Polkadot parachain network of a random number of active oracles from a total number of ten. The simulation has been executed using a computer with an i7 9th generation CPU, 16 GB of RAM, and an SSD drive. We generate several Kibana graphs showing the following metrics generated from the monitoring system:
IIoT devices. The number of active devices (
Figure 9), the average temperature (
Figure 10) and the Overall Equipment Effectiveness (OEE) (
Figure 11). By generating these graphs, we can deeply analyze the production flow, identify possible device failures, overheating problems, and optimize the effectiveness of the industrial equipment by utilizing data-driven techniques as shown in [
52].
Storage. We measure the number of raw data JSONs that are inserted in IPFS from the IIoT devices, and compare it with the data that is finally processed by the IOTA DLT (i.e., processed JSON hashes in IOTA), as shown in
Figure 12. These measurements could help us identify possible anomalies regarding the generating of the data from the IIoT devices. We also compare the size of the data inside IPFS compared to the amount of size of the processed IPFS hashes in IOTA, as shown in
Figure 13. The data size monitoring could be useful to optimize storage space and also visualize the enormous storage burden we avoid putting on the DLT by using decentralized IPFS storage.
The DLTs. The average throughput of IOTA and Polkadot during the simulation, as shown in
Figure 14. Measuring the throughput of the DLTs is crucial in terms of data flow optimization and bottlenecks avoidance [
53].
The oracles. We measure the average number of oracles during the simulation, as shown in
Figure 15. By analyzing the number of blockchain oracles that are involved in providing external data to our architecture, we are able to determine the degree of centralization of the system. For example, having only one active oracle would imply a high degree of centralization, which could affect the security of the whole industrial architecture. Furthermore, the number of active oracles is also useful when calculating the associated costs of this service.
6. Discussion
6.1. Performance Analysis
In this work, we leverage decentralized oracles for data interoperability purposes, i.e., to securely gather the external IIoT data model and perform a homogenization process of machine raw data. Decentralized oracle platforms such as ChainLink intend to enable the development of fast, decentralized, and secure oracles for different applications. ChainLink, however, is strongly linked to the Ethereum ecosystem. On the other hand, Polkadot, despite not being focused on the oracle services field of application, is a highly versatile and interoperable platform in which an oracle solution can be implemented apart from other conventional uses. With Polkadot, we aim to achieve a high degree of interoperability to design a holistic DLT architecture for tomorrow’s Industry 4.0.
However, despite the significant amount of security (i.e., data integrity) that a decentralized oracle mechanism brings to an architecture when providing data, some delays may be introduced due to the complexity of an additional decentralized network in between. Nonetheless, that would have been the case with ChainLink. By using Polkadot, we integrate the oracle platform with the plant blockchain since Polkadot “parachains” have direct connection and compatibility through the main “relay” chain. Furthermore, the performance of Polakdot is significantly higher than other blockchains, such as Ethereum, on which ChainLink is currently based. Moreover, the direct connection between the oracle parachain and the interoperable plant blockchain relay chain incurs near-zero latency. Therefore, we acknowledge that using a decentralized oracle service based on Polkadot for retrieving a JSON data model scheme does not have a significant impact on the performance of the scheme since, for each data model, only one request should be made. Finally, according to the measurements presented in
Figure 15 from
Section 5.3, on average, six oracles have been active for the given external data retrieval tasks. This number of oracles is appropriate to guarantee the complete decentralization of the architecture and almost instantaneously return the JSONs that comprise the Eclipse Unide data model.
As shown in the simulation results presented in
Figure 14 from
Section 5.3, in industrial environments, large amounts of data are generated, thus requiring significant processing and storage capacity. The presented monitoring system shows that our architecture is robust enough when handling great amounts of data. IOTA and Polkadot offer a great processing capacity, almost 1000 tps on average, which is sufficient in this type of environment. Even though Polkadot is not as fast as IOTA, this aspect is not relevant since the processing speed is most important where the data is generated. Furthermore, as shown in
Figure 13 from
Section 5.3, the use of IPFS greatly reduces the storage burden of the DLTs. In addition, the active devices measurements shown in
Figure 9 from
Section 5.3 prove that increasing the number of active devices does not incur a significant impact on performance.
The graphs generated from the continuous monitoring of the architecture help us to identify possible weak points in the process and, consequently, possible ways to improve the homogenization process, the data processing, as well as the management of possible costs. For example, the use of an oracle service could entail certain costs that should be optimized as much as possible by the companies. Thus, using the monitoring system, we could analyze and predict much more aspects, such as the incurred costs, the performance of the system, resource usage, device failures, etc. For example, in
Figure 10 from
Section 5.3, we analyze the average temperature of the devices, where we can see that it has significant fluctuations within the range of 30 and 70 degrees ºC, based on the intensity of the production process. Moreover, in
Figure 11 from
Section 5.3 we can visualize the effectiveness of the industrial equipment (OEE), which gives us clues about the effectiveness of the machines. This information shows that the effectiveness of the machines is highly optimal during the entire simulated period, but with a certain margin of improvement.
6.2. Security Analysis
Regarding the security of the information, we acknowledge that in the presented architecture, the integrity of the data is ensured during the whole process, from when the data is generated in production lines up until it is homogenized and finally exploited at the plant level. This is due to the use of secure DLT technologies throughout the whole process (i.e., production lines DAG DLTs, decentralized blockchain oracles for data homogenization, and plant processing blockchain). As shown in
Figure 12 from
Section 5.3, in the beginning, we simulate an attack in which great amounts of malicious data are generated. Nonetheless, the malicious data is finally discarded by the IOTA DLT. Such examples show that the monitoring of the architecture is also useful for visualizing possible cybersecurity attacks and other types of non-intentional incidents.
However, overall, the proposed architecture involves several components that may introduce potential security risks, including:
IPFS. IPFS is a decentralized storage system, which means that it relies on a distributed network of nodes to store and retrieve data. While this can increase the availability and durability of the data, it also means that there is a risk that some nodes may not be trustworthy or may be compromised. To mitigate this risk, we implemented security measures such as encryption and access control to ensure that only authorized parties can access the data stored in IPFS.
Decentralized oracles service. The proposed architecture involves using a decentralized oracle service to retrieve data models for the data homogenization process. This introduces a potential security risk, as oracle services are often centralized and may be subject to attacks or manipulation. To mitigate this risk, we use multiple oracle sources and implement security measures such as cryptographic signing and verification to ensure the integrity and authenticity of the data retrieved from the oracle service. Another security issue of oracles might be the supply of unreliable information [
54]. However, monitoring the oracles could help mitigate this issue. Thus, in this work, we already make use of a monitoring system.
Interoperable plant blockchain. The interoperable plant blockchain is responsible for storing and managing smart plant homogenized data references and providing access control to IPFS. To ensure the security of this blockchain, it is important to implement measures such as secure consensus algorithms, proper access control and permissions, and regular security audits. Additionally, we implement measures such as encryption and secure communication protocols to protect the data stored on the blockchain.
Smart contract-based notary scheme: The data exchange scheme involves using smart contracts to securely transfer data between the production lines DAGs and the plant blockchain. It is important to ensure that these smart contracts are properly tested and audited to ensure their security and correctness. Additionally, we implement measures such as access control and permissions to ensure that only authorized parties can interact with the smart contracts.
ELK-based monitoring. It is important to ensure that the ELK stack is properly configured and secured to protect against potential security risks and ensure the integrity and confidentiality of the data it processes. We use the latest version of the stack so we can ensure that all the current known vulnerabilities have been mitigated.
Overall, it is important to ensure that all components of the proposed architecture are properly secured, and that appropriate measures are taken to mitigate potential security risks. This process involves implementing a combination of technical and organizational measures such as encryption, access control, cryptographic signing, security audits, and secure communication protocols.
6.3. Comparison with Other Solutions
The most similar DLT-based proposal is the architecture proposed by Jiang et. al [
34]. This work presents a cross-chain framework for efficient and secure IoT data management using a consortium blockchain as the control station and other blockchain platforms customized for specific IoT scenarios as the backbone for IoT devices. The framework merges transactions based on a notary mechanism and is implemented using Hyperledger Fabric and IOTA. However, this work shows a much lower throughput capacity (600 tps vs. 900 tps), and higher overall latency. Furthermore, the security robustness of the aforementioned architecture is not clear, since the authors tackle security concerns only by designing a simple access control system. Moreover, in this work, we go one step further and perform industrial data homogenization and exploitation instead of focusing exclusively on simple data transfer between DLTs. Finally, we also provide advanced monitoring of the whole scheme by using the ELK stack.
However, an industrial data processing, monitoring, and homogenization process can also be non-DLT based. In fact, nowadays, an overwhelming number of real-world industrial architectures are non-DLT based, since this technology is relatively new, and industrial processes take a considerable time to incorporate new technologies. However, here are some potential alternatives to DLTs that could be used for efficient and secure data management and homogenization in Industry 4.0:
Centralized databases: A centralized database is a single repository of data that is managed and maintained by a single entity. This can be an efficient way to manage data in the IoT, as it allows for quick and easy access to data and can scale to handle large volumes of data. However, it can also be vulnerable to security threats, as a single point of failure can compromise the entire system. Furthermore, centralized databases could have serious bottlenecks and collapse in the face of a large amount of data that needs to be processed and homogenized.
Peer-to-peer networks: Peer-to-peer networks allow devices to communicate directly with each other without the need for a central server or authority. This can be an effective way to manage data in the IoT, as it allows for decentralized control and can be highly scalable. However, it can also be less secure, as it relies on the security and reliability of individual devices, and the lack of a robust consensus and data blocks cryptography links, as is the case of the most used DLTs.
Cloud-based solutions: Cloud-based solutions allow data to be stored and accessed on remote servers, which can be accessed over the internet. This can be a convenient and scalable way to manage data in IIoT, as it allows for easy access to data from any location. However, it can also be less secure, as data is stored on servers that may not be physically secure. Furthermore, cloud storage usually entails much higher economic costs than DLTs, especially compared to the more advanced solutions such as IOTA, which does not require fees, or Polkadot, whose fees are low or even zero in private networks.
Thus, our complete architecture not only ensures data integrity and security at every stage of the process but also delivers high performance for handling large amounts of IIoT data. Additionally, it is designed to be cost-effective, making it an attractive solution for businesses looking to leverage the benefits of IIoT with relatively low monetary costs. Furthermore, the implemented monitoring system also provides comprehensive real-time analysis, threat detection, and optimization suggestions across the whole process.
7. Conclusions and Future Work
In this paper, we design a homogenization process for industrial IIoT data using decentralized oracles. We store the resulting data in an interoperable plant blockchain to guarantee the integrity of the data during the whole process, from when it is generated at each production line up until it is exploited at a plant level. We also present the design of a monitoring system that aims to provide a graphical representation of the whole process. Finally, we describe the implementation process in which we employ several cutting-edge technologies, such as IOTA DAG DLT, the Polkadot interoperable blockchain as an oracle service and storage blockchain, and the IPFS decentralized storage solution.
The use of the aforementioned technologies enables industrial companies to process and exploit the industrial data in an efficient manner (i.e., with high throughput and efficiency) while also guaranteeing the integrity and immutability of the data throughout the whole process. Furthermore, the use of an interoperable DLT such as Polkadot with automated smart contracts functionality allows companies to expand the aforementioned benefits to further networks and business processes in which there is a wide variety of different stakeholders.
In future work, we intend to develop a more automatized data homogenization process based on Model Driven Development (MDD) techniques. Despite the fact that in this work, we employed IIoT data that was generated by real-world industrial machines, the implementation and evaluation were performed in a laboratory environment. Thus, in a future work plan, if possible, we intend to go one step further and conduct on-field experiments in a real Industry 4.0 plant. Finally, the gathered data can also be analyzed and processed using AI to improve the production process and predict failures.