1. Introduction
The global manufacturing industry is gradually entering an era of highly competitive, personalized, and complex demands from customers, hyper-automation, and flexible and responsive solutions for developing dynamic capabilities to meet these demands [
1,
2,
3]. In this new era, the information technology, systems, and communication (ITSC) infrastructure is closely integrated with the operations infrastructure of manufacturing organizations in Industry 4.0 [
2]. The problem of cybersecurity risk exposure has emerged because of this transformation. Nowadays, cyber-physical devices are increasingly attacked in manufacturing and other plants (such as power grids and oil and gas industries), resulting in measurable effects on business [
2]. Malicious power shell scripts targeting CPS devices have increased exponentially [
4,
5]. Generally, industries using cloud computing may have good protection against such remote code execution tactics, but insider traders may create deliberate loopholes to make code execution attacks successful [
6,
7,
8,
9]. Further, compromised CPS devices may be allocated to crucial accident-prone zones, which may risk lives and properties. A significant number of threats are expected to be caused by CPS compromises and wrong allocations by insiders. These risks cannot be tackled using the level of controls possible on the cloud computing side, like anti-malware web service security, firewalls, and intrusion prevention. The reason for this is that IIoT devices have computing boards, memories, and storage with very limited capacities. Traditional controls cannot be deployed. Hence, new forms of controls need to be conceptualized [
2,
6,
7,
8,
9].
1.1. Rationale
Blockchains are recognized as crucial instruments for the cybersecurity of industrial IoT devices, particularly when integrated directly with manufacturing planning, operations, and control applications [
10,
11,
12]. The integration of provenance with blockchain solutions enhances the capability of tracking and traceability [
13,
14,
15]. This review is motivated by the potential to add predictive capabilities to provenance blockchains, which can ensure the authenticity of information collected from IoT devices and detect breaches of rules and controls in real time.
Achieving this predictive capability is only possible through the introduction of artificial intelligence (AI) in provenance blockchain solutions. Predictive analytics can ensure the monitoring of authenticated IoT devices within the cloud manufacturing system. It can detect vulnerabilities, such as the injection of malicious code, tampering with sensors, streaming of false sensory data, and misuse of assets attached to IoT data transmitters. Some scenarios where this solution can be useful include:
- (a)
Unauthorized shifting of manufacturing and logistics assets;
- (b)
Wrong allocations of manufacturing and logistics assets;
- (c)
Unauthorized handling and operations of manufacturing and logistics assets;
- (d)
Malicious data streaming from IoT assets, causing the wrong controls to be applied, which may lead to industrial accidents (such as flow and pressure in pipelines being shown as out of range while they are in range);
- (e)
Theft of manufacturing and logistics assets;
- (f)
Movement of production and logistics vehicles outside their permitted geo-fencing;
- (g)
Tampering with collision avoidance and safety systems in manufacturing;
- (h)
Tampering with the control systems of self-driven vehicles.
As far as we know, there is no comprehensive thematic review conducted to explore facts related to these four independent themes so that their interconnections could reviewed by exploring how these themes interact with each other. This review is conducted to explore these aspects and evaluate the feasibility and effectiveness of integrating predictive analytics with provenance blockchain solutions to enhance the cybersecurity of industrial IoT devices.
1.2. Objectives
The objectives of this literature review are the following:
Deeper analysis and synthesis: To perform an in-depth analysis and synthesis of existing studies, identify gaps and areas requiring further research, and assess the significance of the existing research in addressing these gaps;
Integration of findings: To present an informative framework by integrating the findings, providing valuable insights;
Discussion on challenges and limitations: To justify the feasibility and challenges of the integrated framework and provide valuable insights for future research and implementation by discussing the limitations and potential challenges associated with the framework.
1.3. Research Questions
In this research, we examine cloud manufacturing design and the evolving cybersecurity risks posed by the Internet-enabling of Internet of Things (IoT) devices to understand the challenge of cyber-attacks on industrial systems and their consequences. Furthermore, we explore the integration of IoT, provenance blockchains, and artificial intelligence for enhancing Internet-enabled IoT security. The research questions underpinning this research are the following:
- (a)
What cybersecurity risks face the IoT used in cloud manufacturing processes?
- (b)
What are the known solutions for IoT security using provenance blockchains?
- (c)
What are the known solutions for IoT security using artificial intelligence?
- (d)
What are the gaps and potential research directions for future studies?
The remaining sections of this paper are organized as follows:
Section 2 provides an overview of the methodology.
Section 3 presents the detailed results of the existing literature on the four themes.
Section 4 discusses the gaps in the existing research and introduces a solution framework addressing these gaps.
Section 5 presents the conclusions and recommendations.
2. Methodology
This study applied the following steps for the review of the relevant papers to the problem domain: (1) search strategy; (2) study selection process; and (3) analysis of data from the selected papers.
2.1. Search Strategy
First, we identified four categories of keywords that would capture all relevant literature related to our research questions.
Category 1: Keywords related to cloud manufacturing, including “Industry 4.0”, “security risks”, “cloud manufacturing”, “design approaches”, and “design vulnerabilities”;
Category 2: Keywords related to cybersecurity risks to the IoT, including “cybersecurity vulnerabilities”, “Internet of Things”, “cloud-based manufacturing”, “IoT security threats”, and “cloud manufacturing”;
Category 3: Keywords related to provenance blockchain, including “blockchain technology”, “IoT security”, “data provenance”, “authentication”, and “threat mitigation”;
Category 4: Keywords related to AI, including “Internet of Things”, “IoT security”, “predictive analytics”, “artificial intelligence”, “security measures”, and “future directions”.
Studies in the literature were looked up based on keywords in each category through Google Scholar, which is a well-established electronic database of known scientific publishers, such as IEEE, Springer, and Elsevier.
2.2. Study Selection Process
The initial search generated 437 research papers.
Figure 1 illustrates the percentage of research studies by each publisher. It was observed that the majority of research articles were published in three primary databases, IEEE, Springer, and Elsevier, accounting for 25.9%, 20.4%, and 16.7%, respectively.
After removing duplication and non-English articles, this was reduced to 97 papers. Next, 97 records were screened on the overall relevance of these papers by reading their abstract, discussion, and conclusion sections. As a result, 71 papers were selected for further analysis. To select the articles, strict inclusions and exclusions criteria were established, as listed below:
Inclusion Criteria:
The cloud manufacturing design and specific aspects causing exposure to cybersecurity should be covered;
Articles should comprehensively address the cybersecurity risks associated with IoT devices in cloud manufacturing;
Blockchain security for IoT devices should be covered with some proposed models, at least theoretically;
Articles should be related to IoT security in the context of provenance, blockchain, and AI, even if not integrated as a solution;
AI for IoT security should be comprehensively covered.
Exclusion Criteria:
Articles that cover cloud manufacturing design in general terms were excluded;
Articles that did not cover IoT security risks in the context of cloud manufacturing were excluded;
Articles that did not discuss models of provenance and blockchain and provided only generic discussion were excluded;
Articles that did not provide technical details of the role of AI in IoT security were excluded;
Articles that did not provide technical details of the role of AI in IoT security, albeit providing generic discussion, were excluded.
In the end, a final list of 54 papers was selected, and we proceeded to answer the identified research questions of the review.
2.3. Analysis of Data
Based on the specific occurrences encountered when reading the literature content, the previously selected 54 studies were systematically grouped into a pool of thematic clusters. These themes were selected to logically interlink knowledge and research to bridge existing gaps and integrate findings into a cohesive security solution. The review is structured as follows:
Cloud Manufacturing Design: The first theme focuses on the design of cloud manufacturing to establish the research context and identify the problem area;
Cybersecurity Risks to IoT Devices: The second theme examines cybersecurity risks associated with IoT devices in a cloud manufacturing setting, aiming to establish a comprehensive problem description;
Provenance and Provenance Blockchains: The third theme explores the concept of provenance and the use of provenance blockchains to enhance security and traceability;
Artificial Intelligence in IoT Security: The fourth theme investigates the role of artificial intelligence in IoT security.
3. Results
This section presents the in-depth content analysis of each theme for sequential knowledge building about cloud manufacturing design, cybersecurity risks to the IoT, provenance blockchains for IoT security, and artificial intelligence for IoT security, thus revealing literature gaps for future research agendas.
3.1. Theme 1: Cloud Manufacturing Design
The older industrial manufacturing systems comprised industrial control systems running on proprietary protocols and network configurations. The popular protocols were CAN, XAP, DYNET, DNP3, OPC, LONWORKS, BACNET, DIRECTNET, DC-BUS, and MODBUS, which were tied to proprietary bus architectures and interface configurations customized to specific physical networks installed by industrial engineering vendors [
2,
4,
5,
16]. These protocols were defined to integrate physical industrial machines and processes with their control logic built-in electronic circuitry called Programmable Logic Controllers (PLCs). The integration of PLCs was achieved following the IEC 62264, which was based on the proprietary protocols mentioned above, although TCP/IP was also gaining inroads into the overall communication design [
5,
6]. These networks were Supervisory Control and Data Acquisition (SCADA) systems, which were localized control engineering networks for monitoring the data collected from industrial sensors and sending actuation commands for running operations [
2,
5,
6]. SCADA systems comprised supervisory industrial computers running control system software connected with PLCs through proprietary bus architectures and interface configurations customized separately by industrial engineering vendors. They were configured as localized process engineering networks, which were also connected to the TCP/IP networks at data collection points for personal computer monitoring of industrial networks [
5,
6]. However, the role of TCP/IP interfacing was limited and confined to local area networking only. Connectivity to the Internet was not required as there were no resources for manufacturing on the Internet.
Cloud manufacturing was a paradigm shift from the older industrial systems described above. PLCs were transformed into IoT devices, electronically and with advanced firmware, to communicate using the TCP/IP protocol so that they can communicate through open standard compatible wireless networks to the Internet, thus becoming cyber-physical system (CPS) devices [
6,
7]. PLCs as CPSs opened a floodgate of opportunities as the traditional localized boundaries of networked manufacturing operations were broken. With the Internet as the medium, PLCs could be configured, monitored, and controlled remotely. With this transformation, multiple industrial control applications could be migrated to cloud computing, forming a framework called cloud manufacturing. With this change, the concept of decentralized network manufacturing emerged in which manufacturers can share their resources globally to serve a common demand chain following a complex mechanism of interfacing distributed physical manufacturing assets to a logically centralized planning, organizing, scheduling, monitoring, and control system [
7,
8]. In this network, the sensor data and actuation commands can flow on Internet-enabled links and interfaces working on a common TCP/IP with the IPv6 protocol so that manufacturing-related information traffic can flow through massive-scale IoT gateways interconnecting physical manufacturing assets to the cloud manufacturing platform [
9]. The cloud manufacturing platform is highly complex because manufacturing planning, scheduling, operations, monitoring, and controls are now possible remotely by communicating with a theoretically unlimited number of manufacturing assets hooked to the system across thousands of manufacturing plants [
10]. It forms an entire ecosystem of service-oriented manufacturing, where manufacturing assets can be searched, matched, selected, composited, scheduled, and operated remotely based on the customer jobs in hand, following design, manufacturing, and logistics knowledge management on cloud computing [
9,
11].
The cloud manufacturing paradigm is based on a highly complex multilayered ecosystem, starting from IoT-transformed machinery installed in plants acting as CPS devices up to advanced service-oriented applications that can be used by end customers to issue manufacturing orders to be served by the system through complex processes of searching, matching, selection, composition, scheduling, monitoring, and controlling remotely [
9,
10,
11]. Complex big data systems, artificial intelligence systems, and machine-to-machine smart interaction systems drive the cloud manufacturing approach [
12]. At the core of all the complexity is the IoT because it serves as the interface between the manufacturing assets and the Internet. The IoT has caused the firmware of machines to be interfaced with the Internet, thus making it vulnerable to cybersecurity risks. The cybersecurity risks to the IoT in cloud manufacturing are presented in the next sub-section.
3.2. Theme 2: Cybersecurity Risks to IoT Used in Cloud Manufacturing Processes
PLCs were Internet-enabled in the Industry 4.0 era to facilitate manufacturing process engineering and control through cloud computing. Direct interfacing of the PLCs to the Internet transformed them into IoT devices capable of communicating with cloud manufacturing applications either directly or through edge computing systems. Their direct interfacing with the Internet caused them to be vulnerable to cybersecurity threats. However, the traditional manufacturing systems were not exposed to the Internet and were safe from cybersecurity threats. This was because they operated in closed networks running proprietary protocols developed by the manufacturers of control system electronics, which were not connected to the Internet [
6]. Extensive usage of the IoT in industrial applications caused an exponential increase in Internet-exposed machines, causing their exposure to cybersecurity threats [
13]. Securing IoT devices was different from securing computers connected to the Internet because of their low computing and storage capacities, making them unsuitable for installing and configuring traditional Internet security controls. Hence, the security controls had to be implemented remotely to monitor and control their operations through cloud computing and blockchains [
14]. Blockchains have architectural specifications and elements suitable for creating cybersecurity control for protecting the IoT in industrial and other commercial applications. Cryptographic solutions can be enabled in blockchains to protect IoT devices communicating through their respective radio frequency identifications [
15,
16]. The cloud manufacturing framework is multilayered whereby each layer has separate risks. Reference [
12] described that attacks can occur at the physical layer (also called the perception layer). The physical layer faces risks of unauthorized tampering and distributed denial of service (DDoS). The network layer also faces DDoS attacks. In addition, the network layer faces attacks caused by spoofing, selective forwarding, masquerading, sinkholes, wormholes, and routing attacks. Application layer attacks are most damaging as they affect the end customers directly. The majority of application attacks are exploit-based and injection-based. The most prominent attacks on the IoT in the cloud manufacturing environment are reviewed in the next paragraph.
As learned in the previous sub-section, the modern control networking used in cloud manufacturing is controlled by cloud-based monitoring and control software. Essentially, the SCADA networks were replaced by modern IoT networks to facilitate universal protocols, connectivity, and communications. This was possible by the digitalization of PLCs and standardizing the communication protocol to TCP/IP. The supervisory industrial computers were eliminated, as their control system software applications were moved to cloud computing. This change caused a major transformation in the manufacturing industry but also exposed all PLCs converted to IoT devices to cybersecurity threats. There are several ways they can be attacked and harmed [
17,
18,
19,
20,
21,
22,
23,
24]:
- (a)
Eavesdropping attack: the running sequence of processes is intercepted by penetration systems for creating data probes capturing packets of unsecured IoT devices;
- (b)
Masquerading attack: the running sequence of processes is intercepted by penetration systems for making malicious devices and software masquerading as authorized genuine IoT devices;
- (c)
Distributed denial of service (DDoS): attacks caused by penetration systems capable of generating large amounts of junk data streams for overloading networking links, computing resources, memory, and storage driving the IoT infrastructure;
- (d)
Side-channel attacks: penetration systems searching for and compromising the vulnerable, less monitored, and poorly protected side-channel entries into the main computing system driving the IoT infrastructure;
- (e)
Cross-site scripting attacks: penetration attacks caused by scripts, which can be intermingled with the running scripts driving the industrial processes;
- (f)
Cross-site scripting: penetration attacks causing malicious scripts to get intermingled with the running scripts driving the industrial processes run by IoT devices;
- (g)
Exploit-based attacks: penetration attacks caused by first collecting the vulnerabilities of the running systems through scanners and then writing and executing attack codes to exploit the known vulnerabilities; this attack method is the primary mechanism to execute zero-day attacks (attacks caused on the day of launch of a new system when the knowledge of its vulnerabilities is not yet collected);
- (h)
Identity thefts: penetration attacks caused by stealing the authentication and authorization data of genuine IoT devices running in industrial systems so they can be replaced by malicious devices installed with the purpose of causing targeted harm to industrial systems and processes;
- (i)
Insider trading and proliferation: insiders involved in proliferating malicious IoT devices into the industrial network or trading sensitive security information with their accomplices outside the system to cause penetration attacks in the industrial systems;
- (j)
Feeding of fake sensor data: penetration attacks to cause wrong actuations, leading to industrial accidents (fire, explosion, crashing, mechanical failure, etc.);
- (k)
Wormhole, sinkhole, and link ranking attacks: network manipulation attacks are mostly carried out in IPv6 networks, forcing the rerouting of IoT traffic by injecting routing table poisoning and selective forwarding mechanisms.
Protecting cloud manufacturing systems against the above threats requires building a control system framework covering several aspects of the industrial computing system. IoT security is at the core of this system. It is understood that IoT security can be enhanced through the continuous and appropriate tracking of their entry, deployment, and reallocation in industrial systems and processes. Recent academic studies have proposed provenance blockchain as the solution, which is the next theme of this research and reviewed in the following sub-section.
3.3. Theme 3: Provenance Blockchains for IoT Security
At the fundamental level, provenance may be defined as dynamic metadata (data about data) attached to data for identifying individuals, events, and contexts of their creation and modification throughout their lifetimes [
25]. Such dynamic metadata are formed sequentially, along with the operations happening on the data identified by date and time stamps [
26]. Hence, metadata can be used for reverse tracing in forensics applications. This was the most fundamental use of provenance metadata recognized for computing security. In the Industry 4.0 era, provenance has a much wider domain of applications in the areas of algorithmic logging and traceability of events generated by complex interactive systems and artificial intelligence (AI) operating on cloud computing. The traceability of events related to specific data blocks can help in defining and enforcing policy frameworks on cloud computing and cloud manufacturing that are useful in identifying genuine versus unauthorized manipulations in the data blocks [
27,
28,
29]. Based on such assessment, IoT devices generating these data blocks can be identified as risky and requiring attention.
With rapid innovations in AI through the evolution of machine learning algorithms and the resulting automation in decision-making and IoT actuations, the validity of data sources used by AI engines and machine learning algorithms is an emerging challenge [
30]. AI machines get data streams from IoT sensors installed in industrial and logistics infrastructures using the IPv6 communication protocol (called the Industrial Internet of Things). At the fundamental level, the industrial sensors enabled by the Industrial Internet of Things (IIoT) should be configured and calibrated accurately. However, at a higher level, there are challenges of accountability, traceability, fairness, and transparency of data collected by AI machines. This requires that all the mechanisms of data manipulation in IoT devices need to be detected, tracked, and captured to gain comprehensive visibility of provenance data [
6]. This justifies the wider application of provenance in modern cloud manufacturing using IoT/IIoT. Traditional data visualization architectures (such as database queries) are not reliable because they are dependent upon single-point capturing. Reliable data visualization can be achieved when multiple sources are involved. Blockchain technology for provenance has evolved in this context.
A blockchain may be viewed as a network of individual data mining entities that verify and authenticate transactions and post their details as “decentralized ledgers” or “blocks” replicated to all network members [
12]. Such blocks, once shared, cannot be breached because the data mining entities are hidden from the traditional attack points on the Internet or cloud computing. Unlike traditional transactional records, these blocks are not stored on centralized ERP servers where unauthorized manipulations are possible. Further, new transactional members entering the system need to be validated by the existing members and not merely by a single authentication system. Transactional members are added through “smart contracts” that define the rules of interactions and their validations. Smart contracts are transactional contracts and not master contracts associated with traditional master service agreements [
6]. A smart contract captures updates on the transaction rules defined in the smart contract. The transactions will be committed in the smart contract execution records only if all the transaction-specific details required by the network members in the blockchain are populated by the blockchain peers. Such rules make the transactions validated and reliable. Smart contracts are needed when multiple manufacturers collaborate to execute joint manufacturing processes through cloud computing, which is the theme of cloud manufacturing reviewed in Theme 1.
The concept of smart contracts can be used for the reliable execution of transactional contracts in cloud manufacturing. However, the reliability of the transaction data collected from IoT devices needs to be assured. IoT devices communicate through onboard electronics and have firmware to capture and transmit data about the events sensed by the industrial sensors attached to them. However, it should be noted that IoT devices are not powerful computers like laptops [
19,
21]. They have limited processing, memory, and storage capacities and run lightweight firmware sufficient to run JavaScript code for transmitting sensor data. Controls, like firewalls, web service filtering, antimalware, etc., cannot be installed on them. They are highly vulnerable compared with traditional computers and cloud computing virtual machines.
The key issues identified related to IoT devices are the following [
12,
18,
19,
20,
21,
22]:
- (a)
The identity validation of millions of IoT devices inducted in networked cloud manufacturing settings;
- (b)
The tracking of deployment and Internet enabling of millions of IoT devices in networked cloud manufacturing settings;
- (c)
The traceability of IoT devices added, modified, and removed, especially when the assets are mobile;
- (d)
Validating the fidelity of IoT sensor data transmitted by IoT devices engaged in manufacturing processes. Sensor data can be used for influencing process events driving AI-enabled decision-making algorithms running the actuation commands. Fake sensor data streams can enforce accidents through wrong actuations;
- (e)
Tracking and establishing the accountability and liability of individuals/businesses owning IoT devices;
- (f)
Cybersecurity assurances when IoT devices are used in inter-cloud architectures;
- (g)
The accountability and transparency of AI algorithms used for controlling operations, performance, and behaviors of IoT devices;
- (h)
IoT devices indulging in the malicious and erroneous processing of manufacturing events, thus affecting smart contracts negatively.
Blockchains dedicated to capturing provenance data as per provenance smart contract rules can be used for enhanced IoT security in cloud computing. Provenance data collection and analysis in blockchains can ensure greater transparency and traceability of manufacturing process algorithms, better process performance and reliability, better security and privacy controls, improved quality control, and improved protection of the data and intellectual assets of the contributing partners to cloud manufacturing [
31,
32]. The design should have a trust validation mechanism and use the exchange of asymmetrical cryptographic keys. A smart contract for provenance can be generated by creating data structures required by IoT devices to validate their authenticity and clean (uncompromised) status [
29,
33]. The core of the system can be a smart contract builder that will help purchasers, suppliers, and asset owners direct their activities as per the rules defined for IoT devices in the blockchain contractual terms [
34]. A smart logistics planning service may be defined as one that absorbs the contractual terms and loads them on the network of IoT devices, which can then be monitored for compliance by a smart contract monitor powered by machine learning. The rules may involve detecting the origin of bindings, ensuring fault tolerance, and verifying integrity and confidentiality through checks on data, chains, and origins [
35]. Reference [
36] added proof of work and authority verification and consensus while extracting provenance metadata from the IIoT systems comprising high- and low-level information and enforcing the completeness and security of provenance metadata through enforced lineage, verification, data-point locking, parallel verification, and privacy. The activities of locating, visualizing, and capturing provenance metadata in a supply chain can be achieved using a smart contract processor, which is an in-built feature in Ethereum and Hyperledger Fabric frameworks [
37]. The smart contract processor needs in-built AI. The role of AI in IoT security is reviewed in the next sub-section.
3.4. Theme 4: Artificial Intelligence for IoT Security
The role of artificial intelligence in IoT security has recently been studied extensively for securing the device end, IoT networks, and IoT applications at the end of cloud computing. As AI can be deployed at the applications layer, the controls are designed to monitor and control IoT security upstream (fields to applications) and downstream (applications to fields) from cloud computing [
38]. IoT devices do not transmit all their data to cloud computing, as growing loads on cloud computing have resulted in applications being shifted to edge computing [
8]. Hence, AI deployed on cloud computing should have a data analysis canvas spread between cloud and edge computing. This scenario applies extensively to cloud manufacturing. This can be achieved through blockchains having peers interfacing with edge computing databases. The basic layers of authentication, authorization, and access control privileges need to be implemented in edge computing. AI analytics need to be hosted on cloud computing to cover several echelons of edge computing infrastructures.
AI can be used for behavioral analysis evidence in data collected from the IoT for detecting and predicting misuse, anomalies, patterns, intrusions, frauds, data proliferation, or a combination of them [
8,
39]. An AI-based security layer may be viewed as the higher layer above traditional signature-based detection. AI is dependent on the cleanliness, accuracy, relevance, and construction of data and requires thorough data engineering before analysis. AI needs training by established intrusion and anomaly databases to be prepared to make the stated detections/predictions. The analysis requires extraction and normalization of the relevant features from the data through preprocessing [
40]. The effective machine learning algorithms for anomaly detection in data collected from IoT devices are decision trees, random forests, nearest neighbors, support vector machines, and artificial neural networks with short-term memory functions [
23,
39,
40,
41,
42]. Among these, decision trees, random forests, and artificial neural networks are good for detective as well as predictive analysis.
AI needs to be configured for automated detections and predictions of misuse, anomalies, patterns, intrusions, frauds, data proliferation, or a combination of them by detecting the manipulations in the AI inputs compared with the predicted logical values [
39]. This is possible when the detection and prediction logic are inbuilt by the developers in a rule-based system designed for IoT data analysis. The attacks may vary by complexity, from basic input attacks by manipulating data and botnet attacks to advanced and highly dangerous attacks, such as data set poisoning, algorithm poisoning, and model poisoning. AI must be able to detect rule violations, and developers must be able to create robust rules to handle defined attack scenarios. A rule-based AI automation strategy can prevent several types of attacks on IoT devices, such as evasive attacks (masquerading and eavesdropping), malicious code injections (cross-side scripting and side-channel attacks), and flooding attacks [
27]. However, advanced techniques are needed to detect/predict a fake IoT, wormhole/sinkhole/link ranking, exploit-based attacks, DDoS, and feeding of fake sensor data [
23,
41,
43,
44]. For such attacks, supervised and unsupervised learning, reinforcement, and deep learning techniques are required to be implemented. Adversary detection (outside as well as inside) is only possible when these advanced techniques are used. Detection may be possible only after prolonged learning of neural network black boxes comprising multilayered perceptrons or a combination of decision trees and random forests [
23,
41,
43,
44,
45]. Hence, layers of big data collection, organization, and preprocessing, and big data analysis need to be deployed to support the supervised and unsupervised learning, reinforcement, and deep learning techniques of artificial intelligence [
43].
This research was conducted to combine the themes of AI with provenance blockchain for IoT security. Currently, there are few studies combining the themes of AI with blockchains for IoT security, although more studies are expected in the future. The role of blockchains for AI to secure IoT security is expected to involve the collection and storage of trustworthy and immutable data for running detective and predictive analysis [
46,
47,
48,
49]. Data may be stored both off-chain and on-chain, but the integrity of data will be maintained because of the smart contract rules engine applied to them. Both on-chain and off-chain data may be encrypted, but the on-chain data will be digitally signed and use the digital signatures of the blockchain peers. Whether on-chain or off-chain, the trustworthiness of the data stored will be high because of the chain code validation conducted by the blockchain peers following the smart contract rules engine. The blockchain may not include all the IoT devices in edge computing, as the scope will be limited to the ones incorporated for executing a smart contract. Hence, data collection for AI predictions and detections will be limited to the IoT devices included in the scope, thus causing both on-chain and off-chain data storage. AI may be programmed to use both the data storages. The detections and predictions may be conducted based on learning from both the on-chain and off-chain data, but the test data may be confined to on-chain data only, and the decision-making may be applied to IoT devices included in the smart contracts.
This review of the four themes provides a theoretical foundation for addressing IoT security using the integrated themes of provenance, blockchain, and AI. To integrate the themes, a review of the blockchain frameworks was also conducted.
3.5. Review of Blockchain Frameworks
In this sub-section, three blockchain frameworks are evaluated. Ethereum is available to developers as a client machine called the Ethereum Virtual Machine (EVM) that needs to be connected with the global development network of Ethereum so EVM states can be agreed upon by all participants globally [
50]. Currently, Ethereum supports “Eth” cryptocurrencies. Developers need to invest in Eth and then pay to fulfill requests for the execution of their codes. Smart contracts can be written in Solidity and Vyper, which are Ethereum-specific languages [
51]. In addition, several open-source free tools are available to create distributed apps [
52]. This research may use one of those tools to create a provenance application, but only with the permission of the development community and by paying their fees using the “Eth” currency. Ethereum is not suitable for cross-industry blockchain research in applications other than cryptocurrencies as it is tightly controlled by its development community. Its development for new industrial applications can only be achieved in collaboration with others.
Unlike Ethereum, Hyperledger and Corda can be viewed as generic and open blockchain frameworks, which can be implemented across multiple industries for multiple applications and installed at a network level in a single computer running Linux or Windows [
53,
54]. Like Ethereum, they can support cryptocurrencies. The good part of Hyperledger and Corda is that they can be used for academic research on new solutions of blockchains and can be deployed in an isolated development environment controlled by a developer. This is because, unlike Ethereum, they have their native permission systems to implement and control privacy locally. Further, both Hyperledger and Corda do not allow unknown identities to connect and transact on the network.
Hyperledger and Corda operate peer-to-peer semi-private blockchain networks to construct mutually agreed contracts and manage state changes and flows in them. The architecture described in
Figure 1 is taken from Hyperledger sources [
55]. At the fundamental level, the blockchain networks established using Hyperledger and Corda are not public. They are semi-private, which is established at the mutual consent of agreeing parties forming a closed group. The parties contributing to the semi-private blockchain network are “peers” owning “chain-codes” and having “policies” governing transactions over secured “channels”. The secured “channels” constitute the fundamental fabric of the blockchain network. The peers use their chain-code interfacing to the channels for exchanging “assets”. Assets may be imagined as anything having a business and financial value that one party can offer to another party. For example, a basket of apples sold is an asset exchange changing hands, and the payment made for it is also an asset exchange changing hands. All such exchanges are recorded in a “ledger” that is visible to all transacting parties. The exchanges are agreed upon through a “smart contract”. The transactions recorded in the “ledger” are immutable and private as they are recorded in encrypted blocks recognized by hash functions. The transactions are verified using algorithms called “consensus”. The architecture in
Figure 2 was analyzed based on the scenario explained by [
55]. The legend for the identifiers shown in
Figure 2 is the following:
R0, R1, R2, and R3 = Organizations collaborating to build the blockchain network for provenance called BNP; R0 is the contracting authority, and others are suppliers;
R1 has access only to the network configuration CC1; R2 and R3 have access to network configurations CC2 and CC3.
P1, P2, and P3 = Blockchain peers for the suppliers R1, R2, and R3;
O = Blockchain contracting authority (owner of the smart contracts);
C1 and C2: Network channels for network configurations CC1 and CC2, respectively;
CA0, CA1, CA2, and CA3: Certification authorities of the organisations R0, R1, R2, and R3, respectively;
S1 and S2: State databases of network configurations CC1 and CC2;
L1 and L2: Ledgers of network configurations CC1 and CC2;
A1, A2, and A3: Off-chain cloud manufacturing applications maintained by R1, R2, and R3;
ProvDB: Provenance database interfacing A, A2, and A3;
AI: Artificial intelligence.
In the blockchain network design shown in
Figure 1, the state databases S1 and S2 are the main systems receiving regular updates on the events completed, as per the terms of the smart contracts stored in smart ledgers L1 and L2. State changes in S1 and S2 are facilitated by the provenance events logged in ProvDB streamed through the applications A1, A2, and A3 on behalf of organizations, R1, R2, and R3. The administrators of A1, A2, and A3 are responsible for the accuracy and integrity of events data recorded in ProvDB and fed to S1 and S2 state databases inside the blockchains. It should be noted that A1, A2, and A3 are the focal points for building strong provenance security controls in this proposed solution.
The applications A1, A2, and A3 have API interfaces on which they receive data from IoT devices attached to the running processes. They are designed to register IoT devices by assigning them unique hexadecimal keys to authorize them to transmit data. The data from the IoT devices is used to change the states of state databases inside the blockchain. Once authorized, the IoT devices are trusted to provide genuine event updates from the running processes, which can be used for changing the states in the state databases of the blockchain. To verify the ongoing trustworthiness of IoT devices, the provenance blockchain control is proposed. Theoretically, IoT devices assigned with unique hexadecimal keys to authorize them to transmit data should be considered as provenance-verified. However, this is a one-time control and cannot ensure ongoing trustworthiness, especially when under insider threats. To validate trustworthiness continuously, a deep machine learning layer is proposed that collates and uses its database called the Provenance Database (ProvDB) to run training and testing cycles.
The algorithm proposed for machine learning is one of the deep learning algorithms, such as the neural network with backward propagation and long short-term memory, decision tree, and random forest. The machine learning is trained using historical data in the ProvDB database. The records are IoT inputs from the processes running to fulfill the terms of the smart contracts stored in L1, L2, and L3. The machine learning is programmed to learn from the ongoing data streams and predict the next combination of data. A decision rule is programmed to compare the predicted versus actual arrival of the next combination of data. The risks are logged in the form of alerts about the variables linked with the process events ongoing for fulfilling the smart contracts stored in Ledgers L1, L2, and L3.
4. Discussion
Technologies of the Industry 4.0 era have the potential to facilitate remote monitoring and control capabilities, allowing manufacturing and logistics control engineering systems to be transitioned to cloud computing [
1,
2,
3]. This advancement enables communication between these systems and controlled machines over the Internet, expanding operational reach from individual plants to vast geographic areas. Cloud manufacturing control systems can remotely monitor and manage machines, equipment, and robots deployed worldwide, provided that Internet connectivity and digitalization are engineered with sufficient capacity and reliability. However, this new framework is vulnerable due to its “Lack of Physical Touch”. Operating critical systems remotely over the Internet creates potential attack surfaces, leading to cybersecurity breaches within industrial systems [
4,
5,
6]. To address this, additional security controls, such as authentication, authorization, accounting, and tracking of digitalized machines, equipment, and robots, are necessary [
24,
31,
32]. Given that the IIoT is the primary technology for digitalization and transitioning to Industry 4.0, numerous research studies have concentrated on developing relevant security controls for the IIoT. However, the need for cybersecurity measures in industrial systems linked to cloud manufacturing is more complex compared to traditional network security [
31,
32]. Through IIoT integration, machines, equipment, and robots are digitized using IIoT attachments, facilitating standard open communication via TCP/IP [
54]. Similar to conventional monitoring and control systems, real-time data transmission from these devices to their controllers aligns with legacy systems. Previously, prior to IIoT integration, these devices gathered status updates via sensors and transmitted data using proprietary protocols, like LONWORKS and BACNET, to Programmable Logic Controllers tailored to interpret these protocols [
22]. Following digitalization with the IIoT, these devices now utilize the TCP/IP protocol to transmit status updates over the Internet, thereby exposing them to the TCP/IP-related cybersecurity risks absent in proprietary protocol systems. However, these devices lack the necessary security mechanisms to protect these cybersecurity threats as effectively as fully functional computers [
31,
32]. This limitation arises from the fact that IIoT-enabled digital transformation does not convert these devices into fully operational computers; typically, they only employ embedded Java and JavaScript as firmware. Integrating fully functional computers with these devices is impractical due to their large numbers, high costs, and associated feasibility challenges. Recognizing this weakness, research studies have proposed solutions to enhance the security of IIoT-enabled CPS devices without directly installing security software, like firewalls, within them.
Traditional security controls, such as firewalls, antimalware, and intrusion prevention systems, can be deployed at the cloud and edge computing gateway servers. However, IoT devices require special solutions for their security. Solutions have emerged in the areas of provenance captured and recorded in blockchains. Provenance comprises the capturing, recording, and tracking traces of IoT devices, starting from authentication, authorization, and accounting, and later enabling tracking of every involvement and activity of them [
13,
14,
15,
16]. Keeping a continuous eye on IoT devices could be made possible using smart contracts in blockchains. Smart contracts helped in planning, documenting, signing off, and securing project execution details in manufacturing, logistics, and supply chains. When IoT devices were allocated to the tasks of smart contracts, their provenance information was recorded to monitor and control the execution of the smart contracts by them. The Prov-Trust, ProvChain, and Smart Provenance designs by References [
17,
19] and Ref. [
39], respectively, are empirical studies presenting this capability. This capability could ensure the allocation of thousands of assets to manufacturing, logistics, and supply chain projects under cloud manufacturing with security, privacy, and trust validations. However, there was a major problem in the provenance and blockchain hybrid solutions. While the provenance information capture and recording in blockchains ensured the initial security, privacy, and trustworthiness of machines, equipment, and robots to smart contracts, the continuity of security, privacy, and trust was not assured. If the assets allocated to an ongoing smart contract are compromised by malicious attackers or insider traders, the actors monitoring the smart contract execution would not be able to detect them.
For the continuity of security, privacy, and trust, an additional layer or layers of controls were needed. For this purpose, this research bridges this gap by integrating provenance blockchain with predictive auditing of the allocated assets, making AI the enabler for the continuous behavioral monitoring of IIoT devices for mitigating the security, privacy, and trust risks in Industry 4.0. This framework can capture deep insights from IoT devices through continuous data flow and recording in big data systems. Artificial intelligence (AI) can be employed to analyze the data streams to detect malicious or non-compliant behaviors of IIoT-enabled CPS assets allocated to smart contracts. Continuous data flow and predictive analytics by AI can ensure the transparency of operations by building an active digital perception of the operations of the assets.
4.1. Design Analysis
The proposed solution framework is presented in
Figure 3 in a simplified fashion to reduce the complexity of
Figure 2 and conduct a simplified design analysis.
IoT devices installed for cloud manufacturing may be viewed as PLCs with IP communications capability connected to either edge computing servers or directly to the Internet [
18,
21,
22]. These devices may be attached to cloud manufacturing processes executed for smart contracts loaded in the blockchain. They may be considered on-chain IoT devices monitored directly by the “Applications” run by the contracting parties. They capture all the process events defined by the “variables of interest” in the provenance database. The variables of interest are pre-defined in the blockchain rules engine related to the smart contracts being executed. A data management application pulls organized and structured records from the provenance database for training the machine learning algorithm. The data are split into two parts: 80% for training and 20% for testing to enable supervised learning of the machine learning algorithm (one of the deep learning algorithms, such as neural network with backward propagation and long short-term memory, decision tree, and random forest). With the help of supervised learning, machine learning is able to generate the “predicted value of the next record” commensurate with the next states of the variables of interest pertaining to smart contracts. The rules engine compares the latest predicted values of the variables with the actual values received. If they are matching, there are no risks. The variance between the predicted and actual values of the variables may define the risks, causing a breach of trust in IoT devices.
As multiple variables are being monitored, multiple IoT devices collaborating to generate them may come under scrutiny. This process is slightly complex and needs to be understood with an example. The example is taken from an industrial process in which the boiler temperature, boiler pressure, and flow of steam are monitored. There are three IoT devices communicating these variables as continuous streams. With every latest value received, the machine learning makes a prediction. As this is an engineering system following some formula of operation, the predictions are expected to follow predictable time series patterns. For example, if the temperature is rising, it will happen in steps leading to a breach of thresholds, in which case, actuation controls may be applied. If there is a sudden increase in one of the variables (like temperature increasing suddenly from 150 to 300 degrees Celsius), machine learning will detect it immediately because it will vary from the predicted value. A risk will be logged for investigation because such jumps are not practical. If more than one variable returns sudden variations, the risk levels are higher, requiring immediate intervention as IoT tampering may be suspected. As the alarm system will operate in real time, timely interventions can be carried out. The system can be designed in such a way that all field actuations are blocked if the machine learning system logs risks. Once the variations enter the blockchain, all peers will know about the risks. This will ensure transparency to the other parties involved and the customer. Any immediate coordination required can be invoked without any dependence on the field engineers. In the absence of this system, only the field engineers will notice these variations. If they are the insider attackers, no one else will know about these changes, which may finally lead to an accident (like a boiler explosion).
It may be noticed that the system does not consider mere identification as provenance data, albeit continuity in trust constitutes provenance information. Hence, this system can also be used for the real-time monitoring of quality, performance, and sustainability remotely by cloud manufacturing applications. Such real-time monitoring and detection of risks can help in quick interventions to correct a course before significant damages occur. For example, if carbon emissions breach the permitted values, the blockchain peers can quickly intervene and make technical corrections as soon as possible in the running systems. Further, this solution can drive continuous improvements in the technical capabilities of the running systems.
4.2. Challenges and Limitations Associated with IoT Security in Cloud Manufacturing
The solution proposed is dependent upon the behavioral data collected from the IoT devices in cloud manufacturing. Data collection needs to be structured and organized appropriately to obtain accurate and credible predictions by AI analytics. Provenance blockchain can ensure that data collected from IoT devices are protected by making them tamper-free and immutable. However, all these controls can be implemented at the cloud computing end, where the required computing power for provenance blockchain and AI is available.
The challenges in implementing this solution occur when the canvas of implementing it is extended to edge computing and IoT devices [
56]. Edge computing may not have the desired computing power to handle big data generated by IoT devices and then run blockchain and AI software systems. Further, as highlighted by Reference [
56], there may be challenges related to the search and discovery of IoT devices; inconsistencies in IoT data, identity, and access control protocols; and resilience when edge computing is integrated with IoT devices through cloud computing. The Hyperledger fabric framework itself is very resource-hungry. Further, IoT devices are mostly low-power computing systems running merely firmware. They may not support the client agents required for generating the continuous streaming of sensory data required to be stored on the provenance blockchains inside repositories of smart contracts, which can be analyzed by AI algorithms. Thus, significant IT planning and investments are required for implementing the cloud security solution. Moreover, IoT devices require some kind of industry-wide standardization. Uncontrolled IoT devices may not be secured with the desired level of control effectiveness. As cloud computing may have to take most of the big data and AI analytics load, significant bandwidth will be needed for interactions between the edge and cloud computing.
The solution proposed is not suitable for all types of threats to IoT devices. Hence, the possible effectiveness of IoT security needs to be assessed theoretically. In Theme 3, the concerns related to IoT security were raised by reviewing several literature papers [
2,
3,
4,
5,
6,
57]. The proposed solution can ensure a solution to the concerns raised, at least partially. In the conclusion section, a justification of the effectiveness of the proposed solution is presented as the following:
- (a)
Identity validation: All IoT devices will be registered in the system, either as on-chain or off-chain devices; any device can become on-chain once it is allocated to a smart contract;
- (b)
Tracking of deployment and Internet enabling of millions of IoT devices: Unregistered devices cannot be plugged into the manufacturing system because it is driven by the provenance blockchain solution;
- (c)
Traceability of IoT devices added, modified, and removed: Any change in IoT devices will be detected immediately by machine learning, making predictions and comparisons at every new data block received from every IoT device;
- (d)
Validating the fidelity of IoT sensor data transmitted by the IoT devices: Machine learning will immediately detect data streams not following the predictions made and log risks. This may not be a complete solution for data fidelity because it needs to be determined by engineers. However, once decided, machine learning will ensure that it is followed and any deviations are detected promptly;
- (e)
Tracking and establishing accountability and liability: As all IoT devices are registered, their owners will be held accountable for any mishaps happening through them;
- (f)
Cybersecurity assurances when IoT devices are used in inter-cloud architectures: This is not covered in the current solution; however, it can be expanded to run in multi-cloud environments;
- (g)
Accountability and transparency of AI algorithms: As machine learning is monitoring every state change of the variables very closely, variances caused by algorithmic defects can also be detected. However, this will be known only after conducting a causal analysis of the variances;
- (h)
IoT devices indulging in malicious and erroneous processing of manufacturing events: Machine learning will not allow this to happen as it will detect immediately when an IoT device begins to deviate from the running course.
Finally, the possibility of the proposed system under attack should also be considered. The current blockchain solutions are protected by the blockchain peers overseeing the smart contract operations. The proposed solution generates risk logs by AI predictions, which will be entered into the provenance blockchain based on risk assessments by the blockchain peers. This functionality may have the vulnerability of false positives and false negatives because AI predictions may not be 100% accurate. Blockchain peers may absorb some of these, but too many of them may cause complacency in the overall risk assessment process. To make AI predictions trustworthy, the training data need to be of very high quality. Hence, the data engineering operations of the manufacturing company should be of very high quality. Further, regular monitoring of the accuracy of AI predictions is needed. The system is vulnerable to data injections by insider traders so AI predictions may be corrupted by false negatives. Hence, the IT administration of the AI engine and the blockchain itself should be trustworthy.