Blockchain and Random Subspace Learning-Based IDS for SDN-Enabled Industrial IoT Security

The industrial control systems are facing an increasing number of sophisticated cyber attacks that can have very dangerous consequences on humans and their environments. In order to deal with these issues, novel technologies and approaches should be adopted. In this paper, we focus on the security of commands in industrial IoT against forged commands and misrouting of commands. To this end, we propose a security architecture that integrates the Blockchain and the Software-defined network (SDN) technologies. The proposed security architecture is composed of: (a) an intrusion detection system, namely RSL-KNN, which combines the Random Subspace Learning (RSL) and K-Nearest Neighbor (KNN) to defend against the forged commands, which target the industrial control process, and (b) a Blockchain-based Integrity Checking System (BICS), which can prevent the misrouting attack, which tampers with the OpenFlow rules of the SDN-enabled industrial IoT systems. We test the proposed security solution on an Industrial Control System Cyber attack Dataset and on an experimental platform combining software-defined networking and blockchain technologies. The evaluation results demonstrate the effectiveness and efficiency of the proposed security solution.


Introduction
With the industrial revolution, we have witnessed rapid changes in factory automation, transportation security, and surveillance in large-scale industries. To this end, Industrial IoT (IIoT) [1] has drawn a significant interest by incorporating dense wireless devices such as Radio-Frequency IDentification (RFID) tags [2] for machine identification, sensors for large-scale equipment monitoring and fault diagnosis, production, manufacturing, asset monitoring and many applications for power plant, water supplies, oil, and gas refineries.
Industrial control systems (ICS) are used to describe different systems such as Supervisory Control and Data Acquisition (SCADA) and Distributed Control system (DCS). SCADA collects and analyzes data from substations in real-time. Each substation contains control devices, such as Programmable Logic Controller (PLC), Remote Terminal Unit (RTU), and Intelligent Electronic Device (IED), which manages field devices, such as sensors, actuators, and meters. The collected field information is In fact, with the autonomous exchange of data among devices and a server, or in a device-to-device manner, either directly or over a network, will benefit the industrial control system to control and monitor the industrial process locally or at the remote location. The integration of IoT solutions with ICS, named also fourth Generation ICS [3,4], allows collecting and analyzing a large data set over the whole industrial area. By this way, this integration is foreseen as a viable solution towards smart and efficient data gathering and aggregation frameworks for the entire automation industry [5].
Industrial Control Systems (ICS) are becoming primary targets of cyber attacks due to their increased interconnection with other corporate networks. Their exposure to private and public networks has increased the risk of such attacks targeting ICS in recent years [6]. These attacks cause a variety of damages and drastic consequences to humans and their environment. For instance, a power blackout in Ukraine's capital Kiev happened because a SCADA system, which was linked to the 330 kilowatt substation, was influenced by external sources outside normal parameters [7]. In addition, as ICS deploys a large number of network devices like routers and switches, they bring other security issues. As each device represents a possible entry point to the attacker, the more devices we have, the more risks ICS is exposed to. Besides, network devices require continuous management and configuration, which is costly and time-consuming. To deal with this issue, software-defined networking (SDN) [8,9] technology was proposed to facilitate software and hardware updates on the network devices. This is achieved by moving the control of lookup tables stored in the network devices to a central location that allows easy control and management. In this way, the risk of compromising the network devices could significantly be reduced. Software-defined wide-area network (SD-WAN) [10] is a specific application of the SDN technology that is applied to WAN connections. Similar to SDN, SD-WAN also decouples the networking hardware from its control mechanism. However, SD-WAN focuses more on cost savings by reducing the deployment and operational costs. Gartner [11] predicted in 2015 that 30% of enterprises would deploy SD-WAN technology in their branches by the end of 2019.
In this paper, we propose a security architecture for the industrial control system, which is integrated with the SD-WAN technology. The architecture considers the attacks that target the ICS commands, and negatively affect the correct functionality of the ICS. The attacks are classified into two types: (a) forged ICS command that target the industrial control process, and (b) misrouting of commands that is brought to the surface due to the adoption of the software-defined technology, e.g., an adversary that injects fraudulent flow rules, which prevent correct routing of ICS commands and information. Thus, the proposed security architecture requires two main complementary components: (a) an intrusion detection system to defend against the forged commands, and (b) an SD-WAN-based security solution, which prevents the misrouting of commands and information through tampering of the flow rules. The main contributions of the paper are the following:

•
We propose an SD-WAN architecture for industrial control systems.

•
We define the attack model that can target the proposed architecture. The attack model comprises: forged command attacks that target the industrial control process, and SDN-related attacks that misroute commands and information.

•
We propose a security solution for the proposed SD-WAN architecture that includes two complementary components: -An intrusion detection system (IDS), named RSL-KNN, against forged command attacks that target the industrial control process, which leverages the random subspace learning approach and K-Nearest Neighbor (KNN) classifier to outperform conventional machine learning classifiers.

-
A Blockchain-based Integrity Checking System (BICS), which can defend against the misrouting attack, by detecting in a short time any tampering with the OpenFlow rules and preventing the execution of the rules. Differently from [12][13][14] that detects this attack by analyzing the flow rules, our system is lightweight in the sense that it only compares the traffic flow rules, which are originated from the vSwitch, with the one sent by the SDN controller.
• We evaluate the effectiveness and efficiency of the proposed security solution. By applying the random subspace learning-based IDS on the Industrial Control System Cyber attack Dataset [15,16], promising accuracy results are achieved. On the other hand, a blockchain-based integrity checking system is able to detect all attacks against the flow rules at a very low detection time.
The remainder of the paper is organized as follows: Section 2 provides the related work. In Section 3, we present the SD-WAN architecture for the industrial control system along with the attack model. In Section 4, we describe the main components of the security solution for the SDN-based ICS. The implementation and evaluation of the proposed security solution are presented in Section 5 and Section 6, respectively. Finally, Section 7 concludes the paper.

Intrusion Detection Systems for ICS
In order to protect the industrial control systems, several recommendations and good practices can be followed [17]. One important security component that can be used in order to protect these assets against new threats is an IDS. Especially, for identifying anomalous or unexpected behavior in ICS systems, anomaly detection systems have shown great potential [18]. The IDS must be combined with active network monitoring mechanisms for collecting the necessary data along with traditional defense mechanisms like firewalls and antiviruses. Recently, authors in [19] proposed a new taxonomy of ICS IDS by taking into account the characteristics that industrial systems have. Moving away from standard taxonomy like rule-based, misuse detection and mixed systems, the authors divided them into three new categories, i.e, protocol analysis-based, traffic mining-based, and control process analysis-based detection systems. There exist a number of state-of-the-art machine learning-based big data processing technologies for anomaly detection [20]. Although the IDSs that fall into the first two categories manage to detect standard cyber attacks by analyzing the used protocols or the traffic data that is exchanged between different entities of the system, they fail at detecting the so-called semantic attacks. Semantic attacks exploit the knowledge extracted from the normal operation of an ICS about the control systems that are in place or the physical processes that take place, taking into account the close association between ICS and physical systems. Ignoring semantics provides the attackers the chance to gain control of some industrial processes and launch attacks that may tamper the normal operation of some physical devices or change the operating rules on field devices. These attacks pass unattended from traditional IDSs since they do not violate the specifications of the protocols or create any abnormal network traffic in the system. Narayanan et Bobba [21] moved one step further by proposing an application-level anomaly detection framework that detects attacks aiming to modify the products that were manufactured from the industry that uses the ICS.
Many researchers came up in recent years with proposals that took advantage of machine learning models for developing IDSs to protect the control process. Caselli et al. [22] discussed sequence attack scenarios within an industrial control system and developed a sequence-aware intrusion detection system. Khalili et al. [23] proposed a State-based Intrusion Detection System (SIDS) suitable for cyber physical systems. Contrary to other industrial IDSs that consider anomalous states as indications of the cyber attacks, SIDS manages to detect three types of attacks: anomalous states, anomalous transitions between the normal states, and anomalous time-intervals between the normal transitions. Zhang et al. [24] proposed a detection system for attacks against cyber physical systems. The proposed system studied four classical classification models, including k-nearest neighbor (KNN), decision tree, bootstrap aggregating (Bagging), and random forest. To strengthen early attack detection, the proposed system uses an auto-associative kernel regression model. Abokifa et al. [20] developed an algorithm for the detection of cyber-physical attacks and adjusted for smart water distribution systems. Ghaeini et al. [25] proposed a framework for state aware anomaly detection in industrial control systems that is proven to provide lower exposure time while offering better detection in terms of false alarm rate. Trying to cope with data injection attacks in ICS, Wang et al. [26] proposed a new method that uses a Long Short Term Memory Recurrent Neural Network (LSTM-RNN) as a temporal sequences predictor. Li et al. [27] proposed a SCADA firewall model, called SCADAWall, for SCADA security. The SCADAWall model is powered by a comprehensive packet inspection (CPI) technology. To extend capabilities to proprietary industrial protocol protection, the SCADAWall model uses a proprietary industrial protocols extension algorithm (PIPEA). Based on the out-of-sequence detection algorithm, the SCADAWall model can detect abnormality within industrial operations.
Finally, several recent review articles discuss issues related to IDS systems for ICS and SCADA [28,29], all of them showcasing the need for novel sophisticated solutions.

Risk and Threat Modelling for ICS
Falco et al. [30] presented a study on the Industrial Internet of Things (IIoT) cybersecurity risk modeling for SCADA systems. Specifically, the study demonstrated that certain risk metrics are stronger indicators than others in evaluating the likelihood of exploits for SCADA systems. Wood et al. [31] after identifying current risks with SCADA and ICS systems proposed a two layered security architectural pattern to address them. On the other hand, Cook et al. [32] conclude that the parameters for defining risk are too many (consequences C, events A, background information K, measure of uncertainty Q, threat T, vulnerability V) and there is no proven unified risk model exists for ICS that incorporates all of them yet, leaving this important field open for future research.
Nourian and Madnick [33] presented a study about how the vulnerabilities were exploited by Stuxnet, which is an attack designed to interrupt the Iranian nuclear program. Based on prior research on system safety, the study uses a system theoretic approach in order to analyze the threats exploited by Stuxnet. This approach can identify cyber threats towards CPSs at the design level and provides practical recommendations for more secure SCADA and automation systems. Nasr and Varjani [34] proposed an alarm-and-trust based access management system named ATAMS, to reinforce the security of the SCADA system against the deontological threats. The ATAMS system can reduce the deontological threats based on the two levels, including, (i) integrity level of the substations and (ii) the operator trust value.

Blockchain for IoT and SCADA Security
The blockchain technology can be effectively applied in almost all domains of the IoT, especially when the IoT applications demand a decentralized security framework, as presented in [35]. Košt'ál et al. [36] proposed an improved architecture for management and monitoring of IoT devices using a private blockchain. Agyekum et al. [37] proposed a proxy re-encryption scheme that incorporates an inner-product encryption scheme for IoT environments. In order to improve the routing security and efficiency for the internet of sensors, Yang et al. [38] proposed a trusted routing scheme using blockchain and reinforcement learning.
The SCADA can be more intelligent and smart using the IoT networks. However, distributed blockchain-based methods have been proposed recently to detect and defend against cyber-attacks for modern power systems. Liang et al. [39] proposed a blockchain-based framework based on three phases, namely, data transmission, verification, and storage. The data transmission phase uses two keys, including: (1) the public key available in the meter-node network, and (2) the private key containing the node's private information. The verification phase uses an address-based distributed voting mechanism in order to verify the data integrity. The block generation phase uses two strategies as solution methods, including: (1) generating block by a fixed time, and (2) generating block by a fixed size. Liang et al.'s framework can be considered as a promising solution but the privacy and anonymity are not considered, which can compromise the energy trading infrastructure (e.g., using attack tree). To trade energy in a peer-to-peer network without a central price signal, Aitzhan and Svetinovic [40] proposed a token-based energy trading system called PriWatt, for the smart grid using distributed smart contracts. Specifically, PriWatt system uses signing transactions and multi signatures. To validate the authenticity of a transaction, PriWatt system uses ECDSA asymmetric cryptography. Therefore, a transaction can be considered valid only when multiple independent parties sign this transaction. Remarkably, PriWatt system provides certain levels of privacy and security as well as combats double-spending attacks.

SDN-Based SCADA
Several works [41,42] studied the integration of SDN with SCADA systems to facilitate better management and configuration of the network devices. Due to the centralized controller with a comprehensive view, the network operation can be better optimized compared to the traditional management in SCADA systems. Other works focused on designing SDN-based resilient solution for smart grids [43][44][45][46][47]. This is achieved by deploying redundant communications. In case of failures, the communication and services are quickly restored. A Network-based Intrusion Detection System (NIDS) for SDN-based SCADA systems is proposed in [48] using One-Class Classification (OCC) algorithm on the gathered statistics from network devices. Basically, the SDN features were leveraged to frequently modify the multipath routing mechanism between the SCADA devices. One interesting advantage is that the IDS can detect possible malicious traffic with unknown attack signatures. Most recently, a framework with multiple SDN controllers and security controllers is suggested in [49] for SCADA. The local IDS in the substation collects the measurement data periodically and monitors the control-commands. Furthermore, the global IDS that resides in the control center collects the measurement data from the substations and detects any abnormal behavior of the control-commands issued by the SDN controller and SCADA. In addition, a light-weight identity based cryptography is suggested to protect the network from outsider attacks.

Fraudulent Rule Detection in SDN
In the literature, there are some surveys [50,51] that describe the attacks that can target the SDN along with their corresponding security methods. Some methods such as FortNOX [13], FlowChecker [12], and VeriFlow [14] are designed to detect fraudulent rules. FortNOX [13] developed an analysis algorithm to detect rule conflicts. A rule conflict occurs when a new OpenFlow rule is in contradiction with the existing rules. FlowChecker [13] analyzes all switch configurations using a binary decision and model checking to detect misconfigurations within the flow tables. Veriflow [14] checks for any violation of network invariants (e.g., availability of a path towards the destination, absence of routing loops) when a new rule is inserted, deleted, or updated. In a multi-controller architecture, the detection of fraudulent rules and malicious controller in [52,53] is achieved by replicating each rule setup request to multiple controllers simultaneously in order to check for rule consistency, which incurs a high overhead. In order to deal with the issue, requests are randomly replicated in [54].

Comparison with Related Work
To the best of our knowledge, our work is the first that combines the SDN and blockchain technologies into one architecturally secure design for industrial control systems. In addition, it differs from the related work in the following points:

1.
Random subspace learning has never been investigated as an IDS approach for industrial control systems to distinguish between normal events and cyber attacks.

2.
We show that combining random subspace learning and K-Nearest Neighbors improves the IDS accuracy compared to the basic machine learning classifiers, such as SVM, decision tree, random forests, etc.

3.
Differently from FortNOX [13], FlowChecker [12], and Veriflow [14] that detect fraudulent rules based on a heavyweight analysis process, the proposed BICS architecture is lightweight in the sense that it only leverages the blockchain technology to compare the traffic flow rules, which are originated from the vSwitch with the one sent by the SDN controller. 4.
BICS provides a scalable solution under multi-controller environment, as there is no need to involve multiple controllers to check the rule correctness.

SD-WAN Architecture for Industrial Control Systems
We propose an SD-WAN architectural design for ICS that enables network virtualization by migrating the control layer to the cloud, which helps to allow a centralized management. As legacy WANs can be costly and complex, SD-WAN architecture reduces the network cost by offering zero-touch deployment, i.e., there is no need to configure the network device by plugging it in. Instead, the device is configured from the SDN controller. In terms of security, the architecture can provide a unified security policy across the network.
As shown in Figure 2, the proposed architecture is composed of the following components: • Private cloud: It hosts all the components that offer a centralized control for ICS as virtual machines, such as SCADA server, DCS server, and SDN controller. • IP network: Instead of using a dedicated WAN for ICS, we can use the public Internet connection between the SDN and the different substations. All devices are authenticated and end-to-end encryption is established across the network. • SDN controller: It is an application that manages flow control by using protocols such as OpenFlow [55] that tells switches where to send data packets. The OpenFlow protocol is a southbound interface between the controller and the forwarding elements such as switches. The northbound interface considers the communication between the controller and the applications. • Virtual Switch: It is an application that interconnects multiple virtual machines of the same or different hypervisors. Moreover, it also interconnects these virtual machines with other physical switches.
Based on the above architecture, we present the attack model that targets the security of ICS commands, and could adversely affect the correct functionality of ICS. In the following, we consider two types of attacks:

•
Forged command: Attacks that issue forged commands to intelligent electronic devices, which trigger the execution of undesired operations, such as blackout.

•
Misrouting of commands: Attacks that prevent the correct routing of commands and other information between the SCADA server, DCS server, and the different devices of ICS. This attack can be achieved by modifying the flow rules.
Specifically, the above mentioned attacks can be further classified as follows: • We consider that the security of industrial communication protocols like OPC-UA, DNP3, and Modbus is not within the scope of this work, as they have already been analyzed [56][57][58][59].

Security Architecture Description
In this section, we propose two security components, as shown in Figure 3, to address the attack model defined in Section 3, which are:

•
Intrusion detection system (IDS) to identify malicious command issued to the control devices.
In this work, we propose an IDS using the Random Subspace learning (RSL). Later, in Section 6, we show that RSL-KNN, which is the combination of RSL and KNN classifiers, gives better results than the conventional machine learning classifiers.

•
Blockchain-based integrity checking system (BICS), which aims to detect any injection of fraudulent flow rules in the vSwitches.

Random Subspace Learning-Based IDS
The concept of random subspace learning is proposed by Barandiaran [60]. The Random Subspace Learning (RSL) method is an ensemble learning technique, which is also called features bagging or attributes bagging [61]. It is used to improve prediction and classification tasks as: (1) it employs ensemble construction of base classifiers instead of a single classifier, and (2) it takes random subsets of features instead of the entire set of features. In this way, the correlation between features among classifiers is reduced. This method has proved its success in a lot of prediction and classification problems [62][63][64][65].
The random subspace learning process is illustrated in Figure 4, and consists of two phases: training and testing.
In the training phase, we randomly select S features from a set of F features such that S ≤ F. The selected features are fed to a machine learning algorithm to generate a classifier/learner. This operation is repeated B times, and at each time S features are picked at random with replacement to generate a different classifier.
In the testing phase, the outputs from all distinct learners are combined by majority voting to obtain the final prediction or classification result. The main advantage is that combining classifiers improves the accuracy, especially if the classifiers are independent, or not correlated with each other through features. In other words, the classifiers are fed with different sets of features from each other, which reduces the correlation between features among classifiers.
More specifically, we assume that the RSL model contains a number of individual classifiers, which are built from S subspaces of features defined as {C i (.)} i=1,···L . The number of labels returned by the individual classifiers will be given as {y i } i=1,···L , where the returned labels belong to the set of labels (Y) in the training dataset.
For unseen instances x (k=1,···X) of F features, each classifier will classify them based on its features subspace S j=1,··· f ∈ F separately. Then, the outputs from separate classifiers are represented as: Finally, all outputs from separate classifiers are combined using the majority voting Algorithm [63] to obtain the final classification label y as in the following equation: More formally, Algorithm 1 shows the steps to generate the ensemble of random subspace classifiers, and the ones to compute the predicted labels of unseen instances. Let T F N denote the original training dataset of F features and N instances, T S N denote the partial training dataset instances of only S features, which are randomly selected from the original training dataset, Z S M represents the testing dataset of M instances with the same selected features S as the ones selected in the training phase, ML denotes the machine learning algorithm. In the training phase, we take ML and T S N as input B times to generate a classifier CL b , 1 ≤ b ≤ B. In the testing phase, we compute P M b , which represents the classification labels of M unseen instances using the base classifier CL b . Then, we compute P M l , which is the final classification labels of M instances after majority voting of the base classifiers.
As will be seen in Section 5, RSL-KNN classifier is obtained by combining random subspace learning and KNN algorithm. In other words, we get RSL-KNN by setting ML (resp., replacing the Learning Algorithm component) to KNN (resp., with KNN) in Algorithm 1 (resp., Figure 4).

Blockchain-Based Integrity Checking System
Before describing the security solution, we make the following assumptions: • We assume that the SD-WAN ICS is not compromised (i.e., free from malicious code before the installation of the Blockchain-based integrity checking system. Otherwise, forged rules can be considered as legitimate.

•
The Blockchain-based integrity checking system only focuses on southbound communication. We assume that the northbound communication between SCADA server, DCS server, and IDS from one side and the SDN controller from the other side, is secure.

•
We assume that the SDN controller is located in a private cloud, and only accessible from a single host through an authentication and access control mechanism.
The Blockchain [35] is the key element in the design of our integrity checking systems. The basic idea is to provide a solution where all flow rules that are generated from the controller are stored in a verifiable and immutable database. The blockchain is a sequence of blocks, which are linked together by their hash values. In the blockchain network, each user has two keys: one private key to sign the blockchain transaction and one public key that represents its unique address. The user signs a transaction using its private key and broadcasts it to its peers in the network for validation. After validating the broadcast block, which contains the transaction, it is appended to the blockchain. Once recorded, the data in any given block cannot be changed without alteration of all subsequent blocks. In addition, the data exists in multiple hosts at once, so any changes would be rejected by the peer's hosts. In this work, we proposed a private (or permissioned) blockchain. Differently from the public blockchains, the private ones determine who is allowed to participate in the network, and defined actions and permissions are assigned to identifiable participants. Hence, consensus mechanisms such as Proof of Work are not required. Our blockchain is composed of only two nodes: SDN controller, and firewall. The SDN controller creates blocks and shares it with the firewall via the blockchain. The first node has all the permissions, i.e., read, write, and send, whereas the firewall can only read and receive. As shown in Figure 5, the blockchain-based integrity checking system is carried in the following sequences:

•
Upon receiving a request from the Northbound application, the SDN controller is designed to send the corresponding flow rules to the vSwitches. In our design, the SDN controller is also a member of a blockchain. It hashes the flow rules and puts them in a block that is distributed to the other nodes of the blockchain. The SDN controller is the only node in the blockchain, which has the right to create blocks, whereas the rest of the nodes can only read the blockchain.

•
When the flow rules reach the vSwitch node, the latter updates its flow table and saves the rules in the log file.

•
The Firewall collects the vSwitch logs and accesses the BlockChain to obtain the flow rules sent by the controller.

•
If the firewall finds that the two rules, from vSwitch and blockchain, are not similar, it notifies the Administrator to take the appropriate countermeasures to fix this mismatching.

Random Subspace Learning-Based IDS
In this section, we evaluate the performance of the proposed IDS using a real case study scenario that is implemented in [16], and using the Power System Dataset, which is a part of the Industrial Control System Cyber Attack Dataset [15]. Figure 6 shows the industrial control power system architecture. It is composed of the following components: • Two power generators: G1 and G2. • Four breakers from BR1 to BR4. • Two transmission lines: L1 between BR1 and BR2, and L2 between BR3 and BR4. • R1 through R4 are intelligent electronic devices (IEDs) to switch the breakers on or off. The IEDs send information to the control room through a substation switch and a router.
As explained in [16], there are four synchrophasors, each of which measures 29 features, which give in total 116 phasor measurement unit (PMU) measurements. There are also additional 12 features from control panel logs, Snort logs and relay logs. Thus, 128 features are used in this case study scenario. Examples of some features, which are extracted from each PMU, are as follows. The list of 128 features along with their descriptions are given in [15]. The dataset [15] considers the following two normal events and three attack events. The normal events are as follows: • Short-circuit fault: It represents a short in a power-line and can occur at different locations along the line.

•
Line maintenance: Power system operators occasionally must take a transmission line out of service to allow maintenance.
The dataset also considers the following attack events: • Remote tripping command injection: This attack sends a command to a relay and causes a breaker to open. • Relay setting change: Relays are controlled via configurable settings. Certain settings exist to disable relay operations. This class of attacks alters relay settings to disable relay operation such that the relay will not trip for valid commands or faults. • Data Injection: This attack aims to imitate a valid fault by altering system measurements followed by sending an illicit trip command from a compromised computer to relays at the ends of the transmission line. This attack aims to blind the operator and causes a blackout.
The dataset is composed of 15 sub-datasets, as shown in Table 1. The events in SCADA systems are used in the following two main classification tasks: • Classification of multi-class events: This classification task contains 37 scenarios of events, and includes normal event, natural event, and attack events with their own class labels.

•
Classification of binary class events: This task also contains 37 event scenarios, which are divided into nine normal events and 28 attack events. All the 15 sub-datasets consist of thousands of distinct event types and are sampled at 1% in a random manner. Therefore, each sub-dataset contains 3711 attack instances, 294 samples of no event instances, and 1221 natural events instances. Table 1 summarizes the distribution of instances in the 15 SCADA sub-datasets. The RSL method is also implemented using Weka tool [66]. Table 2 summarizes the parameters that are considered in the implementation. A 10-fold cross-validation strategy is also adopted to apply the proposed method on the 15 SCADA sub-datasets.

Blockchain-Based Integrity Checking System
The Blockchain-based integrity checking system is implemented, as shown in Figure 7, using the following components: • Private cloud: We use Openstack [67] to implement the private cloud. • BlockChain: We use Multichain [68], which is derived from Bitcoin Core [69], to implement a private blockchain. It uses JSON [70] to create blocks. The role of this blockchain is to save all the operations transmitted from the SDN controller to the different switches. Multichain ensures the following properties:

-
The activity of the blockchain is only visible to the chosen participants.

-
It provides read and write privileges on the transactions • SDN controller: We use ONOS [71] to implement SDN controller. ONOS is an SDN that provides the control plane of the network. It manages its components such as switches and routers, and links. It runs the software that provides communication services to end-users and neighboring networks. • Mininet: Mininet [72] is a network emulator, which creates a network of virtual hosts, switches, controllers, and links. It allows creating an SDN prototype to simulate a network topology using switches supporting OpenFlow.
As shown in Figure 7, the Insert() program captures the traffic sent from the ONOS controller to the vSwiches in order to get the flow rules of the ONOS controller and save them on the blockchain.
In order to access the blockchain, write permissions to create blocks on the blockchain are assigned to this program. Each created block contains the following information: • PUBLISHER: the SDN controller identifier.

Performance Evaluation
In this section, we evaluate the performance of two components of our solution: Random Subspace Learning IDS, and Blockchain-based integrity checking system (BICS).

Random Subspace Learning IDS
In the following, we present the performance results of the Random Subspace Learning IDS with respect to effectiveness and efficiency.

Effectiveness Evaluation
In this section, we use a set of baseline machine learning classifiers to test their ability to detect SCADA attacks. The used baseline classifiers are as follows: Linear Support Vector Machine (LSVM), Bayes Network (BN), Naive Bayes with kernel estimator (NB-K), K-Nearest Neighbor (KNN), AdaBoostM1, Bagging, Decision Tree (DT), and Random Forests (RF). All these classifiers are applied on the 15 SCADA datasets and implemented using the open-source tool of machine learning, namely Weka [66]. In the implementation, we fix the number of iterations for ensemble classifiers to be 25, the block size to be 100, and the bag size to be 50. Other settings remain as default. A 10-fold cross validation strategy is used in the testing. This strategy randomly partitions the datasets into 10 sets of instances and selects one set for testing and the other nine sets for training. We repeat this strategy 10 times and take the average to summarize the results for each of the used 15 sub-datasets.
The evaluation results are presented under the following metrics: where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. Tables 3 and 4 show the accuracy results of the above-mentioned machine learning classifiers under binary classification (natural and attack) and multi-class classification (natural and different types of attacks), respectively. Motivated by the accuracy results of intrusion detection for KNN classifier compared to other classifiers, we propose an effective method that combines the random subspace method with KNN classifier, named Random Subspace Learning-based K-Nearest Neighbor (RSL-KNN) method. The basic idea behind RSL-KNN method is to create sufficient KNN classifiers using different random subsets of selected features. This idea improves the accuracy, especially when there is a large number of features. Table 5 shows the accuracy results of intrusion detection using binary classification (natural and attack) and based on three different numbers of learners. Table 6 shows the accuracy results of RSL-KNN under multi-class classification (natural and different types of attacks) and based on three different numbers of learners. We can observe that RSL-KNN outperforms KNN under both classification tasks. As shown in Table 7, while the false positive rates of RSL-KNN under multi-class classification are between 0.3% and 0.4%, they are higher binary-class classification.

. Efficiency Evaluation
To evaluate the time cost of training and testing of RSL-KNN compared to KNN classifier, we train and test both classifiers on sub-dataset 9, which contains 5340 instances. This dataset is divided into 3738 instances for training and 1602 for testing and the time cost is measured for binary and multi-class classification. Table 8 shows the time of training and testing in seconds for both classifiers. We can notice that RSL-KNN incurs an insignificant additional time during the training. In the testing phase, RSL-KNN shows higher values than KNN. We analyze the security of BICS and discuss its resilience against the following attacks: • Unauthorized Access to SDN controller: We assumed that the SDN controller is located in a private cloud, and only accessible from a single host. Thus, it is impossible for an external adversary to gain authorized access to the SDN controller. In addition, by applying an authentication and access control mechanism, we can prevent unauthorized hosts from accessing the network resources, as explained in [73]. Therefore, there is no way that fraudulent flow rules are generated from the SDN controller.

•
Man-in-the-middle attack between switch and controller: As fraudulent flow rules cannot be generated from the SDN controller, and as the latter is the only node that has the right to create entries in the blockchain, therefore, the blockchain only stores legitimate flow rules. If the flow table of the vSwitch is poisoned with tampered rules, the firewall will eventually detect this attack after comparing the vSwitch logs and the rules stored in the blockchain. -External adversary: If the adversary tries to use the private key to generate fake blocks, this attempt will be detected as the operation comes from outside the network, whereas the SDN controller is located inside the network.

-
Internal adversary: To prevent an internal adversary from using the private key and generate fake blocks, the SDN controller is only accessible from a single host and access control mechanism are implemented.
In addition, to mitigate the risk of private key leakage, the network administrator needs to implement security controls related to key management.

Performance Evaluation
We evaluate the performance of BICS by varying the number of false rules that are injected into the network. In order to perform this test, we disconnect the SDN controller and inject the rules at the switch-level. Table 9 summarizes the detection time and the detection rate of BICS. We observe that BICS achieves a detection rate of 100% with a very low detection time. The full detection rate is explained by the fact that the blockchain is immutable, i.e., it ensures that data once written to a blockchain cannot be altered. To ensure immutability, the blockchain is based on two main concepts: hashes and chains of blocks, which are proved mathematically to ensure data integrity. If an adversary creates a fraudulent flow rule and wants to inject it in the vSwitch, it cannot alter an existing flow rule in the blockchain and make it similar to a fraudulent one. In addition, as proved in Section 6.2.1, any injection of new forged flow rule in the flow table of the vSwitch is eventually detected. Moreover, it is important to mention that the detection time of BICS is scalable with respect to the number of injected rules.
To evaluate the execution time overhead of BICS, we measure the below metrics by varying the number of vSwitches that are deployed in the network.  Table 10 shows that ETO increases as the number of switches increases. This is because the firewall has to retrieve the log information from each switch, which affects LRT. On the other hand, we observe that BCT, RRT, and PT are low and are less affected by increasing the the number of switches. We can also observe that BICS incurs a very low block creation time compared to other public blockchain platforms, e.g., bitcoin, that requires around 10 min to create one block [74]. This is due to the fact that these blockchains run a consensus mechanism like Proof of Work (PoW) or Proof of State (PoS) in order to mine, validate and append a new block to the blockchain. In case of BICS, no consensus is required, and it is replaced with access rights that are assigned to known participants.

Conclusions
In this paper, we have proposed a security architecture for IoT-based industrial control systems, which integrates the Blockchain and the Software-defined wide-area network technologies. The proposed security architecture is composed of an intrusion detection system, named RSL-KNN, and a Blockchain-based Integrity Checking System (BICS). The proposed security solution has been tested on an Industrial Control System Cyber attack Dataset and on an experimental platform combining software-defined networking and blockchain technologies. The proposed security solution has produced an overall good performance. RSL-KNN has scored an accuracy of 96.73% and 91.07% under binary class and multi-class classification tasks, respectively. In addition, BICS can detect fraudulent flow rules at a detection rate of 100%, and is scalable in terms of detection time. As a part of future work, we plan to test more Industrial Control System (ICS) cyber attacks datasets, and apply different deep learning techniques for better IDS accuracy. Moreover, it would be interesting to leverage the blockchain technology to prevent injection of fraudulent flow rules in the flow tables, instead of only detecting them.