AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant

Raman, Gauthama; Mathur, Aditya

doi:10.3390/app132413124

Open AccessArticle

AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant

by

Gauthama Raman

^1,* and

Aditya Mathur

^1,2

¹

iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design, Singapore 487372, Singapore

²

Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(24), 13124; https://doi.org/10.3390/app132413124

Submission received: 22 September 2023 / Revised: 28 November 2023 / Accepted: 30 November 2023 / Published: 9 December 2023

(This article belongs to the Special Issue Advances in Attack Detection and Secure State Estimation for Cyber–Physical Systems (CPS))

Download

Browse Figures

Versions Notes

Abstract

:

Critical Infrastructure Security Showdown 2021—Online (CISS2021-OL) represented the fifth run of iTrust’s international technology assessment exercise. During this event, researchers and experts from the industry evaluated the performance of technologies designed to detect and mitigate real-time cyber-physical attacks launched against the operational iTrust testbeds and digital twins. Here, we summarize the performance of an anomaly detection mechanism, named AICrit, that was used during the exercise. AICrit utilizes the plant’s design to determine the models to be created using machine learning, and hence is referred to as a “design-enhanced” anomaly detector. The results of the validation in this large-scale exercise reveal that AICrit successfully detected 95.83% of the 27 launched attacks. Our analysis offers valuable insights into AICrit’s efficiency in detecting process anomalies in a water treatment plant under a continuous barrage of cyber-physical attacks.

Keywords:

anomaly detection; critical infrastructure; industrial control systems; critical infrastructure security showdown; water treatment plant

1. Introduction

Industrial Control Systems (ICS) integrate computational and physical components to monitor and control critical infrastructure such as water and sewage systems, oil and gas distribution networks, and nuclear plants. However, the increasing connectivity to corporate internet technology and the use of proprietary software and devices make these systems vulnerable to both insider and outsider attacks. Traditional defense mechanisms are often bypassed by the attackers who exploit these vulnerabilities resulting in catastrophic failures with significant consequences. Recent forensic reports have highlighted successful security breaches in public infrastructure [1], further emphasizing the urgency of addressing these risks. Given the pivotal role of ICS, it is crucial to develop mechanisms that can accurately and promptly detect process anomalies, thus enabling reliable and uninterrupted operation of these systems. This paper focuses on AICrit [2], a dedicated anomaly detection mechanism designed to fulfill this critical requirement.

CISS2021-OL was the fifth run of iTrust’s international technology assessment exercise organized by a team of faculty, students, and staff at the Singapore University of Technology and Design. In this event, several defense mechanisms, namely DAD [3], AICrit [2], AEGIS, FLBI, AD_check, ATTESTER, and SECURE DIGITAL WATER TWINS (SDWT) were evaluated. This paper focuses on assessing the performance of AICrit in safeguarding the operational water treatment plant named SWaT (Secure Water Treatment plant) against attacks launched by multiple teams of attackers located in different parts of the world. Furthermore, we also analyzed the suitability of scaling AICrit to much larger city-scale water treatment plants in Singapore.

The following research questions were the focus of this event with respect to AICrit:

RQ 1: How effective is AICrit in detecting the cyber-physical attacks that have an adverse impact on the physical processes of underlying SWaT?, and
RQ 2: How efficient is AICrit in supporting the plant operators toward active incident response and recovery?

Novelty: The architecture of AICrit, the use of plant design information, and its performance can be found in [2]. The key differences between the performance evaluation reported in this work and that in [2] are summarized in Table 10 in Section 5. From the data, we observe that the evaluation performed and reported in [2] was on a smaller scale than that performed during CISS2021-OL. We therefore believe that while the work in [2] is novel and worthy of sharing with the community at large, so is the large-scale experimentation reported here. The performance of AICrit in CISS2021-OL further attests to the strengths of the novel methods used in the design of AICrit.

Contributions: The contributions of this paper are listed below.

Provides detailed information on the attacks launched, and their impact, on an operational water treatment plant during a large cyber exercise.
The performance of AICrit is compared with that of DAD—a design-centric anomaly detection mechanism.
Lessons learned during the exercise are reported. We believe that these lessons are useful to researchers in the design of machine learning-based defense mechanisms for safeguarding CI and translating them to large-scale systems.

Organization: The remainder of this paper is organized as follows. Information regarding the SWaT testbed along with a description of the AICrit and CISS2021-OL is in Section 2. Section 3 enumerates the attacks launched by the red teams during CISS2021-OL. The corresponding response of AICrit is in Section 4. Section 5 compares the performance of AICrit against DAD and answers the above-mentioned research questions. A summary of the recent literature on safeguarding ICS is in Section 6. Conclusions based on CISS2021-OL are in Section 7.

2. Preliminaries and Background

2.1. SWaT—Architecture

SWaT (Secure Water Treatment) is a testbed used for education, training, and research. It is one of the testbeds used during CISS2021-OL. Details of SWaT architecture and the treatment processes used are in [4]; a summary follows. SWaT consists of six interconnected stages labeled stage 1 through stage 6. Each stage consists of a set of sensors and actuators controlled by a Programmable Logic Controller (PLC). In stage 1, the incoming water is stored in a raw water tank T101. Water from T101 is transferred via a chemical dosing station in stage 2 to an Ultra-filtration (UF) unit in stage 3 for the removal of undesirable materials. Using UF feed pumps, the water is passed from stage 3 to stage 4 for the removal of excess chlorine through an Ultraviolet (UV) de-chlorination process. In the next filtration step, the inorganic impurities are removed from the dechlorinated water using a two-stage Reverse Osmosis (RO) process in stage 5. The treated water is stored in stage 6 for distribution, or it can be recycled.

2.2. AICrit

AICrit [2] is designed to provide real-time process monitoring for ICS with the goal of preserving control behavior integrity. It utilizes a combination of artificial intelligence and design knowledge to learn the normal spatio-temporal relationships among correlated components. AICrit is composed of two main modules: AiBox and RuleBox. AiBox uses deep learning algorithms such as deep Multi-Layer Perceptron (MLP) and Long-Term Short Memory (LSTM) neural networks, to model sensor behavior. Similarly, RuleBox utilizes a decision tree algorithm to generate rules associated with relationships among sensors and actuators. By combining the models and rules generated by these modules, the functional dependencies of sensors and actuators are continuously monitored to detect and report anomalies in the ICS.

The operation of AiBox in modeling the behavior of the water level sensor LIT101 in SWaT, shown in Figure 1, can be demonstrated using the following strategies.

PbNN (Physics-based Neural Network) approach [5]: This method employs a physics-based deep learning algorithm to model the behavior of LIT101. It considers the spatiotemporal relationships between the interdependent components of LIT101, i.e., FIT101 and FIT201, as a non-linear function $f_{P b N N} ()$ formulated as follows,

$x_{1} (t + 1) = f_{P b N N} (x_{1} (t), x_{2} (t), x_{3} (t))$

(1)

where $x_{1} (t), x_{2} (t),$ and $x_{3} (t)$ denote measurements from the sensors LIT101, FIT101 and FIT201, respectively, at time t. With the availability of historical data, a physical model can be built using Equation (1) to predict $x_{1} (t + 1)$ with minimal error.
NN (Neural network) approach [6]: This method considers the prediction of LIT101 as a time series forecasting problem. The LSTM is utilized to learn and model the temporal dependencies represented as $f_{N N} ()$ present in the historical readings of LIT101.

$\begin{matrix} x_{1} (t + 1) & = & f_{N N} (x_{1} (t), x_{1} (t - 1), . ., \\ x_{1} (t - i), . ., x_{1} (t - n)) \forall i \in [1, n] \end{matrix}$

(2)

The resulting model is then deployed to predict LIT101.

In both the approaches, fine-tuning the hyper-parameters of the deep learning algorithms was carried out during the training process with constraints aimed at minimizing the prediction error as in Equation (3)

m i n i m i z e ϵ = \frac{1}{n} \sum_{i = 1}^{n} {(x_{1} (t) - \hat{x_{1}} (t))}^{2}; w . r . t . H_{p}

(3)

where n is the number of samples and

H_{p}

the set of hyperparameters. In this work, we adopt the scikit-optimize (https://scikit-optimize.github.io/stable/, accessed on 29 November 2023) algorithm to automate the process of selecting the best hyperparameter values. Once the optimal model has been selected for each sensor, the predicted measurements are compared against the actual data from the plant to discover anomalies. The difference between the actual and predicted values, known as the forecasting error, will always be non-zero. To quantify it effectively, and avoid false positives, we use the Cumulative SUM (CUSUM). The CUSUM approach defines two parameters, namely, the Upper Control Limit (UCL) (Equation (5)) and Lower Control Limit (LCL) (Equation (7)), to determine the acceptable levels of positive and negative deviation in the forecasting error.

P (t) = M a x (0, r (t) - τ - b) \forall t, 1 \leq t \leq T

(4)

U C L = M a x (P (t))

(5)

N (t) = M i n (0, r (t) - τ + b) \forall t, 1 \leq t \leq T

(6)

L C L = M i n (P (t))

(7)

where,

P (t)

and

N (t)

are, respectively, the allowable positive and negative side deviations.

τ

represents the safety limit and b corresponds to the allowable slack. During the anomaly detection phase,

P (t) > U C L

or

N (t) < L C L

indicates an anomaly.

Similarly, to demonstrate the operation of RuleBox, consider the relationship between the valve MV101 and level sensor LIT101 (refer Figure 1). RuleBox generates multiple conditions for MV101 to be opened and closed based on LIT101 measurements using a decision tree algorithm. Finally, the condition with the highest weight is selected as the best rule. This process is repeated for other dependencies between the actuators and sensors to monitor the process flow across the distributed system.

2.3. CISS2021-OL

CISS2021-OL was conducted over two weeks from the 6th to 17th September 2021 at SUTD. This was an online exercise where the participants remotely launched attacks online and monitored SWaT status. The goal of this event was to enable researchers to (i) validate and assess the effectiveness of their defense mechanisms tailored for iTrust testbeds, (ii) develop capabilities for safeguarding CI against cyber-attacks, (iii) understand the composite Tactics, Techniques, and Procedures (TTP) for enhanced Operation Security, and (iv) practice the approaches for compromising and defending CII. The participants in this event were categorized as follows.

Red team: Up to 10 local and international teams from government organizations, the private sector, and academia.
Blue team: Commercial vendors were invited based on their past performance in similar events and nominations by Singapore Government agencies.
IHL Anomaly Detection teams: Anomaly detectors from iTrust.
CII Blue Teams: CII operators and regulators.
Observers: Singapore Government agencies and their invitees.

Two months prior to the event, each red team was provided with the technical details of iTrust testbeds such as network architecture, communication protocols, and devices used. The teams were informed of attack targets (Table 1). Similarly, for blue teams, the IT and OT data collected during the normal operation of SWaT were provided and their defense mechanisms were integrated with the testbeds. The event was spread over two weeks to meet the following objectives.

Red Week—September 6 to 11: Each red team was assigned a 5-h slot during which they were able to launch attacks. Points earned by each team were weighted based on the impact of the attack and the number of defense mechanisms successfully bypassed. Blue team members were commercial vendors who deployed their technologies to defend against the red team attacks.
Blue Week—September 13 to 17: A composite red team from multiple organizations was formed to launch attacks defended by the blue team consisting of CI operators and regulators. An 8-h slot was available to the blue team for responding to the attacks launched.

Details of red and blue teams, the online exercise platform, and the attack launch procedures are in [7].

3. CISS2021-OL Attacks

The two Red Team sessions mentioned in Section 2.3, launched a total of 27 successful attacks listed in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. Attacks listed in the table are typed as follows: SSSP-Single Stage Single Point attack; SSMP-Single Stage Multi Point attack; and MSMP-Multi Stage Multi Point. Here, “stage” refers to one of the six stages of SWaT and “point” to a target. Each device in the plant is associated with a unique identifier tag. The tags are AIT: chemical sensor, FIT: flow rate sensor, LIT: level sensor, MV: motorized valve, P: pump, and T: tank. A sample of these attacks are described below.

A4: Drain water from the UF process: In this attack, the attacker intended to manipulate the status of the actuators in stage 3 to drain water during the ultrafiltration process. This attack was executed in two steps. First, PLC3 was set to manual mode wherein the operator could directly control the actuator status. In the second step, the measurement from level sensor LIT301 was set to 1100 mm, and valves MV301 and MV302 valves were closed. Subsequently, the attackers opened valve MV303 and switched on the pumps P301 and P302.
A7: Control the chemical dosing system: The attacker intended to degrade the water quality by disrupting the chemical dosing system. The attackers changed the commands sent, and measurements received, by PLC2. First, all pumps in stage 2 were started, and valves MV201 and MV301 opened. Subsequently, measurements from chemical sensors AIT201 (conductivity), AIT202 (pH), and AIT203 (ORP) received by the PLC2 were set to 69, 11, and 111, respectively.
A17: Disrupt the RO process: The attacker intended to exploit the vulnerabilities in the SCADA workstation to redirect water from the RO process to the UF backwash tank. A cyber-criminal attacker model was designed to launch this attack. Using Metasploit [8], the attacker exploited the Windows-7 vulnerabilities to retrieve various login credentials. Using administrative credentials and RDP [9], the attacker gained access to the SCADA workstation to control the entire plant. Using SCADA’s user interface, the attacker manually closed all valves in stage 5, except MV503 and MV504, thereby redirecting water from the RO process to the UF backwash tank.
A22: Manipulating the valves to disturb the raw water tank filling process: The objective of this attack was to stop the inflow of raw water into tank T101 for 2 min. To realize the objective, MV101 was set to manual mode and closed.
A25: Damage the water pumps: The objective of this attack was to damage the water pumps that transfer water from the raw water tank to stage 2. The attackers alternately opened and closed pumps P101 and P102.

Attacks and anomaly: Note that AICrit is designed to detect process anomalies. Thus, any attack that causes an abnormal change in the water treatment process, ought to be detected by AICrit. However, there could be attacks wherein the physical process might not deviate from its normal operation. An example of such an attack is one intended to crash the historian server. Such an attack will not be detected by ACrit because, as in this case, the detector stops receiving plant state. This example is illustrative of the importance of the location, and the limitations, of an anomaly detector in a plant.

Table 2. Attacks launched by red team 1.

ID	Type	Target(s)	Intention	Description
A1	SSSP	MV101	Stop Raw Water Filling	Close MV101
A2	MSMP	All pumps in stage 2, FIT201, AIT202, and AIT203	Disrupt the chemical dosing process	Switch ON all pumps in stage 2 Spoof FIT201 Open the valves MV201 and MV301 Set AIT201, AIT202, and AIT203 to 69, 11, and 111, respectively
A3	MSMP	MV301, MV303, and P602	Stop the backwash process	Close the valves MV301 and MV303 Switch OFF the pump P602
A4	MSMP	LIT301, MV301, MV301, MV303, P301, and P302	Redirect water in T301 into UF drain	Set LIT301 to 1100 mm Close the valves MV301 and MV302 Open the valve MV303 Switch ON the pumps P301 and P302
A5	MSMP	MV501, MV504, MV502, MV503, and P601	Redirect all water from RO process to RO reject tank	Close the valves MV501 and MV504 Open the valves MV502 and MV503 Switch ON the pump P601
A6	MSMP	All actuators in stage 5, P601, P602, and P603	Redirect all water from RO process to RO reject tank	Close the valves MV501 and MV504 Open the valves MV502 and MV503 Switch ON the pump P501, P502, P601, P602, and P603
A7	MSMP	All pumps in stage 2, FIT201, AIT202, and AIT203	Disrupt the chemical dosing process	Switch ON all pumps in stage 2 Spoof FIT201 Open valves MV201 and MV301 Set AIT201, AIT202, and AIT203 to 69, 11, and 111, respectively
A8	MSMP	LIT301, MV301, MV301, MV303, P301, and P302	Redirect all water in T301 into UF drain	Set LIT301 to 1100 mm Close the valves MV301 and MV302 Open the valve MV303 Switch ON the pumps P301 and P302
A9	MSMP	All actuators in stage 5, P601, P602, and P603	Redirect all water from RO process to RO reject tank	Close the valves MV501 and MV504 Open the valves MV502 and MV503 Switch ON the pump P501, P502, P601, P602, and P603

Table 3. Attacks launched by red team 2 during CISS2021-OL.

ID	Type	Target(s)	Intention	Description
A10	SSSP	MV101	Stop raw water filling	Close MV101 indefinitely when LIT101 drop below 500 mm
A11	MSMP	MV501, MV504, MV502, MV503, and P601	Redirect all water from RO process to RO reject tank	Close the valves MV501 and MV504 Open the valves MV502 and MV503 Switch ON the pump P601

Table 4. Attacks launched by red team 3.

ID	Type	Target(s)	Intention	Description
A12	SSSP	LIT101	Disturb the water filling process in T101	Set LIT101 to 900
A13	SSSP	P602	Suspend the backwash process	Switch OFF the pump P602
A14	MSMP	P101 and MV201	Degrade the water quality	Switch ON P101 and open MV201
A15	MSMP	MV201, MV301, MV302, and MV304	Redirect all water in T301 into the UF drain	Close MV201 Close MV301 and MV302 Open MV304
A16	SSMP	All valves in stage 5	Redirect all water from the RO process to tank T602 and empty tank T601	Open MV503 and MV504 Close Mv501 and MV502
A17	NA	PLC3	Stop the UF process	Comment out the entire UF process in PLC3 code

Table 5. Attacks launched by red team 6.

ID	Type	Target(s)	Intention	Description
A18	SSSP	FIT101	Disturb the water filling process in T101	Set FIT101 to 0
A19	SSSP	P602	Stop the backwash process	Switch OFF the pump P602
A20	SSSP	P602	Stop the backwash process	Switch OFF the pump P602
A21	MSMP	MV301, MV502, MV503, and P602	Suspend the backwash process	Close the valves MV301, MV502, and MV503 Turn OFF the pump P602

Table 6. Attacks launched by red team 8.

ID	Type	Target(s)	Intention	Description
A22	SSSP	MV201	Disturb the water filling process in T101	Close MV201 for 2 min
A23	MSMP	All pumps in stage 2 and UV401	Degrade water quality	Turn ON the pumps P201, P202, P203, and P204 Turn OFF the pumps P205, P206, P207, and P208 Turn OFF UV401

Table 7. Attacks launched by red team 9.

ID	Type	Target(s)	Intention	Description
A24	MSMP	P203, P204, P403, and P404	Degrade water quality	Turn ON the pumps P203 and P204 Turn ON the pumps P403 and P404
A25	SSMP	P101 and P102	Destroy the pumps	Continuously turn P101 and P102 ON and OFF
A26	MSMP	All valves in stages 1, 2, and 3	Redirect all water in T301 into the UF drain	Open valves MV303 and MV304 Close valves MV301 and MV302 Close valves MV101 and MV201

Table 8. Attacks launched by red team 10.

ID	Type	Target(s)	Intention	Description
A27	SSMP	MV301, MV302, MV303, P301, P302 and LIT301	Redirect all water in T301 into UF drain	Close the valves MV301 and MV302 Open the valve MV303 Turn OFF the pumps P301 and P302 Set LIT301 to 600mm

4. Results

As part of CISS2021 preparation, several upgrades were carried out to SWaT. The chemical tanks were refilled, existing sensors re-calibrated, and additional components, e.g., valves and sensors, deployed. As these upgrades alter the characteristics of the data generated, the existing AICrit was retrained to avoid false positives. During this process, the representative models and explainable rules of AICrit discussed in [2], were fine-tuned using the data collected from SWaT by running them under various operating conditions.

Table 9 summarizes the response of AICrit to the attacks launched during the CISS2021 event. We note that anomalies due to twenty-three out of twenty-seven attacks were detected soon after launch while those from the remaining four were not detected. Although the undetected anomalies were from attacks with the intention to alter the physical processes of SWaT, two out of four did not cause any process anomalies. Let us consider attack A1 in Table 2. In this attack, the adversary intended to stop the raw water filling process in tank T101 by closing the inlet valve MV101. Under normal circumstances, MV101 is opened only when level sensor LIT101 is below its preset low marker. However, as LIT101 is above the low marker, MV101 was already in a closed state and hence did not affect the physical process. Similar logic applies to attack A10 in Table 3. The adversary changed the control of MV101 to manual mode and attempted to close it when the level sensor LIT101 measurement dropped below 500 mm. However, due to water inflow from stage 6, the recycling process, to tank T101, LIT101 did not fall below 500 mm; hence, there were no process anomalies.

Anomalies due to attacks A14 and A18 were not detected by AICrit. In A18 the adversary set FIT101 to zero. Although this change was easy enough to detect, prior to the attack the adversaries brought down the historian. Hence, the detectors did not receive the plant state. Attack A14 was a Multi-Stage Multi-Point (MSMP) targeted pump P101 in stage 1 and valve MV201 in stage 2. Under normal SWaT operation, P101 is started only when LIT101 is above the low marker and LIT301 is below its high marker. Similarly, MV201 is opened if either P101 or P102 is running. Prior to the launch of A18, LIT101 and LIT301 were above the preset low marker, and P101 was not running; hence, MV201 was closed. Although pump P101 was started by the adversary before LIT301 dropped to below its low marker, this attack was not detected as no control rules were violated in AICrit and MV201 was turned ON.

From the above discussion, we note that of the 27 attacks, and within the scope of AICrit, two attacks (A1 and A10) did not lead to any process anomalies, and in the case of attack A18, AICrit did not receive the exact data from the plant. Thus, the overall detection rate of AICrit was 95.83%.

5. Discussion

5.1. Impact of CISS2021-OL on AICrit

The design and performance of AICrit has been published in [2]. The evaluation reported here is considered a significantly larger-scale study than reported in [2]. Thus, as indicated in Table 10, while during the prior study, attacks were designed and launched by the authors, in CISS2021-OL this task was delegated to independent international teams. Several attacks launched during CISS2021-OL were different from those reported in [2]. Such attacks are valuable for researchers during the evaluation of anomaly detection methods on publicly available data from the SWaT testbed [10,11]. The work reported here offers additional evidence in support of the effectiveness of AICrit when deployed in a CI and information of value to the community that has not been provided in [2].

Table 10. Evaluation of AICrit in [2] and this work.

Differentiators *	AICrit in [2]	AICrit in CISS2021-OL
Type	Insider	Insider and external
Launchers	Authors	Hackers from outside
Entry	Network	Network and physical
Launch point	SCADA	Network and inside SWaT
Targets $^{†}$	SSSP	SSSP, SSMP, MSSP, and MSMP
Launched	13	27
Detected	13	23
False positives	0	0
Process impact	Minor	Major **

* Differentiators are based on cyber and physical attacks. ** Some attacks led to catastrophic failures, such as tank overflow and abnormality in the chemical properties of water, resulting in service disruption. ^† See Section 3.

5.2. AICrit Vs. DAD

DAD is a design-based anomaly detection system created manually for SWaT. The invariants used in DAD discussed in [3] are extracted using the design knowledge obtained from the PLC code, operator manuals, design documents, and other vendor-provided manuals. As it requires a deep understanding of SWaT’s physical processes, DAD outperformed all other detectors in several cybersecurity exercises reported in [5,12,13,14]. Thus, DAD is considered to be a gold benchmark anomaly detection system for SWaT and the performance of the other detectors was compared against it.

AICrit uses a design-enhanced data-centric approach based on machine learning algorithms and computational intelligence techniques. As discussed in Section 4, AICrit takes P&ID and operational data as input to model the higher-order and non-linear dependencies among the highly correlated components of a plant. As reported in Table 9, we experienced similar performance for both AICrit and DAD. Particularly, attacks A1 and A10 were not detected by DAD since they did not raise any process anomalies. Similarly, A18 was not detected due to a lack of data from the historian. To summarize, the overall detection rate of AICrit was found to be the same as that of DAD.

5.3. Suitability Assessment for Detectors to Large-Scale Plants

Theoretically, two major factors that impact the performance of an anomaly detector are its rates of detection and false alarms. Several existing methodologies claim that their proposed approaches outperform others in these metrics based on experiments conducted on benchmark datasets or in a simulated environment. However, there is no guarantee that similar performance will be experienced when deployed in large-scale operational systems.

For example, consider the work on probabilistic neural networks (PNN) [15]. From the experimental results using the benchmark data, it was found that the average attack detection and false alarm rates were, respectively, 99.29% and 0.002%. However, when the same PNN model is deployed in SWaT, the detection rate is reduced to 1.3%, and the false alarm rate increases to 87.14%. A similar problem was encountered while altering the water level markers of tank T101, which is reported in [2]. Reasons underlying the drastic change in the performance are (i) the dynamic nature of ICS, (ii) the aging factor of components, and (iii) temporal glitches. Thus, one should consider these practical issues while designing and deploying an anomaly detector on large-scale plants.

Unlike traditional methodologies, AICrit operates over the spatio-temporal dependencies among the components. The interactions across the highly correlated components are accurately learned through the application of machine learning algorithms coupled with design knowledge. Thus, the dynamic nature of ICS has minimal impact on the performance of AICrit. Further, intrinsic parameters, namely UCL, LCL, window size, and count size (refer to Equations (6) and (8)–(10) in [2]) of AICrit prevents the generation of false positives for short-term deviations experienced due to noise and temporal glitches. Thus, the adaptability of AICrit to the above-mentioned factors makes it suitable for large plants.

The research questions formulated in Section 1 are re-visited below in the context of the experimental results presented above.

RQ 1:

How effective is AICrit in detecting the cyber-physical attacks that have an adverse impact on the physical processes of underlying SWaT?

As reported in Table 9, AICrit detected all but one attack (A18) that led to an abnormal change in the physical processes of SWaT. Further, the false-positive rate of AICrit during CISS2021-OL was zero. The experimental results discussed in Section 4 indicate that AICrit accurately learned the interactions across SWaT components for the effective detection of process anomalies.

RQ 2:

How efficient is AICrit in supporting the plant operators toward active incident response and recovery?

AICrit can accurately model the normal behavior of each component based on the given state and compare it with the measurements retrieved from the operational SWaT. By doing so, AICrit not only detects the process anomalies accurately but also possesses the ability to report the semantics of the detected anomalies. This is evident from Table 9. Comparison of the response of AICrit against the attacks reported in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, makes it clear that AICrit can localize the components under threat. Further, it can recommend recovery actions to the plant operator to restore the normal operation of the plant prior to the occurrence of the actual damage.

5.4. Lessons from the CISS2021-OL

Automated re-training/re-tuning AICrit: The representative models along with the intrinsic parameters of the AICrit discussed in [2] were built and fine-tuned using the operational data. Thus, any change in the data characteristics due to process upgrades or aging factors of components leads to an unacceptable number of false alarms. For example, prior to CISS2021-OL, UF membranes in stage 3 were replaced. This caused a large variation in the data generated by the differential indicator (DPIT-301) sensor (Figure 2). Thus, the retraining process was carried out in AICrit to achieve zero false positives. However, for large-scale plants, an automated mechanism is needed to initiate the retraining process in response to a significant change in the data characteristics, such as mean and standard deviation. Nevertheless, this introduces a new research question: how frequently should this process be conducted to consistently achieve zero false alarms, given the highly dynamic nature of large-scale plant operations?
Detector placement: The placement of detectors is another important research area as it plays a crucial role in their performance. In CISS2021-OL, all detectors received the data from the clone historian, which is a copy of the primary historian. As discussed in Section 4, the adversaries were successful in disrupting the primary historian, and none of the detectors received process data. Thus, it is advised to deploy the detectors in a distributed manner so that compromising all data sources at the same time becomes harder for an attacker.
Restoring SWaT operation: One of the major challenges faced during the CISS2021-OL was to bring back SWaT to its normal state after a successful attack. It is necessary to avoid the cascading effect of an attack. This involves several processes including (i) resetting all network communications, (ii) restoring the control codes of all PLCs, (iii) bringing back all sensors and actuators to auto mode in case they are set to manual mode by the adversaries, and (iv) reverting all devices such as the SCADA, HMI, and historian to their respective states that existed prior to the attack launch. Performing the above tasks manually is a time-consuming and error-prone task and thus needs automation.

6. Related Work

There exist several techniques for anomaly detection in ICS. This section summarizes recent work in data-driven anomaly detection.

The authors of [17] proposed a distributed anomaly detection architecture based on autoencoders, Transformer, and Fourier mixing sublayer. The proposed unsupervised approach was evaluated on several benchmark datasets such as SWaT, HAI, Gas pipeline, and power demand, and proven to achieve better detection precision and F1 score.

STGNN [18], a spatio-temporal graph neural network, is proposed to monitor ICS processes. Initially, using variational decomposition mode, dimensions of the incoming data from the ICS are reduced and fed into a graph neural network for anomaly detection. The proposed approach dominates the eight existing state-of-the-art algorithms in terms of recall, precision, and F1 score. In [19], the authors have proposed a self-supervised contrastive learning methodology for anomaly detection in large-scale industrial data. In this approach, a deep neural network is employed to accurately detect the anomaly patterns. In [20] the authors present an integrated framework based on Kalman filter and permutation entropy for detecting stealthy attacks against an ICS. As an extension, Hu et al. [21] integrated Kalman filter with residual skewness for detecting process anomalies.

Tang et al. [22] proposed a multivariate time series prediction approach to ensure reliable operation of ICS. In this work, both neural graph networks and gated recurrent units are integrated to accurately model the dependencies among the sensors for the anomaly detection process. This approach was evaluated on SWaT and WADI datasets and proven to achieve a better detection rate compared to the nine state-of-the-art algorithms. The author of [23], has proposed a bi-anomaly-based IDS for industrial cyber-physical systems. The proposed approach integrates IDS based on neural networks and decision-making systems. The proposed approach was evaluated on both benchmark and non-industrial datasets and found to achieve a higher detection rate with minimal false alarms. Das et al. [24] proposed a supervised learning approach named Logical Analysis of Data (LAD) to extract rules from historical data for several operating conditions of the plant. The efficiency of the proposed approach was evaluated on sensor measurements from SWaT data and compared with other anomaly detection approaches. Although the proposed approach can effectively localize the anomalies, it can detect anomalies that target only the sensors.

The authors of [25] proposed a physics-informed gated recurrent graph neural network for anomaly detection in industrial cyber-physical systems. Initially, the dependencies among the variables are modeled using directed graphs, and their interactions are learned using the recurrent graph neural network. The proposed approach was evaluated on SWaT and WADI datasets and proved to be better in detection ability compared with ten state-of-the-art methods.

To summarize, the techniques mentioned above are evaluated on a benchmark dataset or in a simulated environment. Such evaluation does not necessarily ensure that, when deployed in an operational environment, the techniques will lead to similar results. We believe that, when feasible, testing an anomaly detector as described in this work, leads to more repeatable results than testing on synthetic or even benchmark datasets such as those offered by itrust [11].

7. Conclusions

Several tools are available to detect and/or prevent cyber-attacks. These include firewalls and Intrusion Detection Systems (IDSs). Practically, such tools cannot be directly applied to protect a CI since the attack surface of an ICS is different from that of the classical IT, i.e., network-based systems. Thus, designing a generic defense mechanism against most, if not all, ICS attacks is a challenging task. AICrit is intended to detect process anomalies. Unlike other available solutions that attempt to detect network traffic anomalies, AICrit monitors the physical processes controlled by an ICS and monitors the correctness of the interactions among the devices. From the study reported here, AICrit was found to be effective in detecting cyber-attacks that lead to process anomalies. Also, from Table 9, it is observed that AICrit supports the plant operator in (i) localizing the anomalies, (ii) identifying the area of impact, and (iii) suggesting recovery actions to restore the normal plant operation.

In CISS2021-OL, a total of ten red teams and six blue teams participated and six anomaly detectors were deployed. The event served as an opportunity for researchers to know how an adversary designs and launches attacks against an operational CI and also enables the evaluation of defense mechanisms. Details of all attacks launched, together with the dataset, are made public and available in [7,11]. In addition, it is worth noting that differentiating the reasons for anomalies, whether they result from cyber attacks or the malfunctioning of plant components, remains an open research problem that needs further investigation and exploration.

Author Contributions

Conceptualization, G.R.; Methodology, G.R. and A.M.; Validation, G.R.; Investigation, A.M.; Writing—original draft, G.R.; Writing—review & editing, A.M.; Funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported in part by the National Research Foundation, Singapore, under its National Satellite of Excellence Programme “Design Science and Technology for Secure Critical Infrastructure: Phase II” (Award No: NRF-NCR25-NSOE05-0001). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AITX	Chemical sensor in stage X
CII	Critical Information Infrastructure
CI	Critical Infrastructure
CISS	Critical Infrastructure Security Showdown
DAD	Distributed Attack Detection
DPITX	Pressure sensor in stage X
FITX	Flow meter sensor in stage X
HMI	Human Machine Interface
ICS	Industrial Control Systems
LCL	Lower Control Limit
LITX	Water level sensor in stage X
MSMP	Multi Stage Multi Point attack
MVX	Motorized Valve stage X
PNN	Probabilistic Neural Networks
PLC	Programmable Logic Controller
PLCX	Programmable Logic Controller for stage X
PX	Pump in stage X
RIO	Remote Input Output
SCADA	Supervisory Control And Data Acquisition
SSMP	Single Stage Multi Point attack
SSSP	Single Stage Single Point attack
SWaT	Secure Water Treatment plant
TTP	Tactics, Techniques, and Procedures
TX	Water tank in stage X
UCL	Upper Control Limit
UF	Ultra-filtration
UV	Ultraviolet

References

Hassanzadeh, A.; Rasekh, A.; Galelli, S.; Aghashahi, M.; Taormina, R.; Ostfeld, A.; Banks, M.K. A review of cybersecurity incidents in the water sector. J. Environ. Eng. 2020, 146, 03120003. [Google Scholar] [CrossRef]
Raman, G.M.R.; Mathur, A.P. AICrit: A unified framework for real-time anomaly detection in water treatment plants. J. Inf. Secur. Appl. 2022, 64, 103046. [Google Scholar] [CrossRef]
Adepu, S.; Mathur, A. Distributed Attack Detection in a Water Treatment Plant: Method and Case Study. IEEE Trans. Dependable Secur. Comput. 2021, 18, 86–99. [Google Scholar] [CrossRef]
Mathur, A.P.; Tippenhauer, N.O. SWaT: A water treatment testbed for research and training on ICS security. In Proceedings of the 2016 International Workshop on Cyber-physical Systems for Smart Water Networks (CySWater), Vienna, Austria, 11 April 2016; pp. 31–36. [Google Scholar]
Raman, M.G.; Mathur, A.P. A Hybrid Physics-Based Data-Driven Framework for Anomaly Detection in Industrial Control Systems. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 6003–6014. [Google Scholar] [CrossRef]
Raman, M.R.G.; Somu, N.; Mathur, A. A multilayer perceptron model for anomaly detection in water treatment plants. Int. J. Crit. Infrastruct. Prot. 2020, 31, 100393. [Google Scholar] [CrossRef]
CISS2022-OL. Critical Infrastructure Security Showdown 2021—Online (CISS2021-OL). Technical Report. 2022. Available online: https://itrust.sutd.edu.sg/ciss/ciss-2021-ol/ (accessed on 22 May 2022).
Metasploit. Available online: https://www.metasploit.com/ (accessed on 7 November 2023).
Remote Desktop Protocol. Available online: https://en.wikipedia.org/wiki/Remote_Desktop_Protocol (accessed on 7 November 2023).
Goh, J.; Adepu, S.; Junejo, K.N.; Mathur, A. A dataset to support research in the design of secure water treatment systems. In Proceedings of the International Conference on Critical Information Infrastructures Security, Paris, France, 10–12 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 88–99. [Google Scholar]
iTrust SUTD. Dataset from iTrust Testbeds. 2022. Available online: https://itrust.sutd.edu.sg/itrust-labs_datasets/ (accessed on 22 May 2022).
Adepu, S.; Mathur, A. Assessing the effectiveness of attack detection at a hackfest on industrial control systems. IEEE Trans. Sustain. Comput. 2018, 6, 231–244. [Google Scholar] [CrossRef]
Adepu, S.; Mathur, A.P. Detecting Multi-Point Attacks in a Water Treatment System Using Intermittent Control Actions; SG-CRC: Singapore, 2016; pp. 59–74. [Google Scholar]
Adepu, S.; Mathur, A. Generalized attacker and attack models for cyber physical systems. In Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Atlanta, GA, USA, 10–14 June 2016; Volume 1, pp. 283–292. [Google Scholar]
Gauthama Raman, M.R.; Somu, N.; Mathur, A.P. Anomaly detection in critical infrastructure using probabilistic neural network. In Proceedings of the Applications and Techniques in Information Security; Shankar Sriram, V.S., Subramaniyaswamy, V., Sasikaladevi, N., Zhang, L., Batten, L., Li, G., Eds.; Springer: Singapore, 2019; pp. 129–141. [Google Scholar]
Chakravarty, I.M.; Roy, J.; Laha, R.G. Handbook of Methods of Applied Statistics; McGraw-Hill: New York, NY, USA, 1967. [Google Scholar]
Truong, H.T.; Ta, B.P.; Le, Q.A.; Nguyen, D.M.; Le, C.T.; Nguyen, H.X.; Do, H.T.; Nguyen, H.T.; Tran, K.P. Light-weight federated learning-based anomaly detection for time-series data in industrial control systems. Comput. Ind. 2022, 140, 103692. [Google Scholar] [CrossRef]
Wang, Y.; Peng, H.; Wang, G.; Tang, X.; Wang, X.; Liu, C. Monitoring industrial control systems via spatio-temporal graph neural networks. Eng. Appl. Artif. Intell. 2023, 122, 106144. [Google Scholar] [CrossRef]
Tang, X.; Zeng, S.; Yu, F.; Yu, W.; Sheng, Z.; Kang, Z. Self-supervised anomaly pattern detection for large scale industrial data. Neurocomputing 2023, 515, 1–12. [Google Scholar] [CrossRef]
Hu, Y.; Li, H.; Luan, T.H.; Yang, A.; Sun, L.; Wang, Z.; Wang, R. Detecting stealthy attacks on industrial control systems using a permutation entropy-based method. Future Gener. Comput. Syst. 2020, 108, 1230–1240. [Google Scholar] [CrossRef]
Hu, Y.; Li, H.; Yang, H.; Sun, Y.; Sun, L.; Wang, Z. Detecting stealthy attacks against industrial control systems based on residual skewness analysis. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 74. [Google Scholar] [CrossRef]
Tang, C.; Xu, L.; Yang, B.; Tang, Y.; Zhao, D. GRU-Based Interpretable Multivariate Time Series Anomaly Detection in Industrial Control System. Comput. Secur. 2023, 127, 103094. [Google Scholar] [CrossRef]
Alem, S.; Espes, D.; Nana, L.; Martin, E.; De Lamotte, F. A novel bi-anomaly-based intrusion detection system approach for industry 4.0. Future Gener. Comput. Syst. 2023, 145, 267–283. [Google Scholar] [CrossRef]
Das, T.K.; Adepu, S.; Zhou, J. Anomaly detection in industrial control systems using logical analysis of data. Comput. Secur. 2020, 96, 101935. [Google Scholar] [CrossRef]
Wu, W.; Song, C.; Zhao, J.; Xu, Z. Physics-informed gated recurrent graph attention unit network for anomaly detection in industrial cyber-physical systems. Inf. Sci. 2023, 629, 618–633. [Google Scholar] [CrossRef]

Figure 1. Stage 1 of SWaT water treatment plant; MV101—Motorised valve, LIT101—Water level sensor, P101—Pump, FIT101 and FIT201—Flow meters.

Figure 2. Comparison of feature statistics: (a) behaviour of DPIT-301 in 2021 (minimum value = 0.02, maximum value = 20.00, std = 6.87); (b) histogram plot of (a); (c) behaviour of DPIT-301 in 2022 (minimum = 0.01, maximum = 0.0608, std = 0.0027) (d) histogram plot of (c); p-value of K-S test [16] is 0; hence, rejecting the null hypothesis, i.e., the distributions are not identical.

Table 1. Attack targets and attacks during CISS2021-OL.

Target	Description
Chemical dosing	Alter the amount of chemical used for dosing
Historian	Alter data; launch DoS attack
HMI/SCADA	Alter the sensor measurements and actuator states; launch a DoS attack
PLC	Alter the control code; launch DoS attack; alter the commands and measurements sent or received
Pump	Switch on or off the pumps
Pressure	Change the pressure values
RIO/Display	Control the RIO by disconnecting an analog Input/Output pin
Valve	Open or close the motorized valves
Water level	Spoof the water level in a tank

Table 9. Performance comparison of AICrit and DAD.

Team ID	ID	AICrit Response
RT1	A1	Not detected *
	A2	Abnormal change in FIT201
	A3	MV301 is closed but P602 is running
	A4	Abnormal change in LIT301 Both P301 and P302 are running
	A5	Both MV502 and MV503 are open
	A6	Both MV502 and MV503 are open Both P501 and P502 are running
	A7	Both P201 and P202 are running Abnormal change in FIT201
	A8	Abnormal change in LIT301 Both P301 and P302 are running
	A9	Both MV502 and MV503 are open Both P501 and P502 are running
RT2	A10	Not detected *
RT2	A11	Both MV502 and MV503 are open
RT3	A12	Abnormal change in LIT101
	A13	MV301 is open but P602 is not running
	A14	Not detected *
	A15	P201 is running but MV201 is closed P301 is running but MV302 is closed
	A16	Both MV503 and MV504 are open
	A17	P301 is running but FIT301 is below 1 m $^{3}$ /hr
RT6	A18	Not detected *
	A19	P602 is OFF but MV301 is open
	A20	P602 is OFF but MV301 is open
	A21	P602 is OFF but MV301 is close
RT8	A22	LIT101 is below 500 mm but MV101 is closed
RT8	A23	Both P201 and P202 are running P401 is running but UV401 is OFF
RT9	A24	AIT202 is below 5.9 but P203 is running Both P203 and P204 are running.
	A25	Abnormal behaviour in P101 Abnormal behaviour in P102 Both P101 and P102 are running
	A26	MV101 is closed when LIT101 is increasing from 500 mm to 800 mm

* Attacks not detected both by AICrit and DAD.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Raman, G.; Mathur, A. AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant. Appl. Sci. 2023, 13, 13124. https://doi.org/10.3390/app132413124

AMA Style

Raman G, Mathur A. AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant. Applied Sciences. 2023; 13(24):13124. https://doi.org/10.3390/app132413124

Chicago/Turabian Style

Raman, Gauthama, and Aditya Mathur. 2023. "AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant" Applied Sciences 13, no. 24: 13124. https://doi.org/10.3390/app132413124

APA Style

Raman, G., & Mathur, A. (2023). AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant. Applied Sciences, 13(24), 13124. https://doi.org/10.3390/app132413124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AICrit: A Design-Enhanced Anomaly Detector and Its Performance Assessment in a Water Treatment Plant

Abstract

1. Introduction

2. Preliminaries and Background

2.1. SWaT—Architecture

2.2. AICrit

2.3. CISS2021-OL

3. CISS2021-OL Attacks

4. Results

5. Discussion

5.1. Impact of CISS2021-OL on AICrit

5.2. AICrit Vs. DAD

5.3. Suitability Assessment for Detectors to Large-Scale Plants

5.4. Lessons from the CISS2021-OL

6. Related Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI