Security Assessment of Industrial Control System Applying Reinforcement Learning
Abstract
:1. Introduction
1.1. Related Work
- Applying SARSA-based optimal pathfinding on the attack graph of the ICS and using the CVSS scoring system to determine the worst-case attack in the graph.
- Identifying the weakest component in the ICS and determining the best place to deploy security measures like intrusion prevention (IP) and intrusion detection (ID).
2. Preliminaries
2.1. Industrial Automation System Topology
- Physical Level (P)
- PLC: PLCs are used to replace Remote Terminal Units (RTUs) in industrial systems. In general, PLCs are solid-state devices meant to replace hard-wired relay panels. PLCs are favored because they are economical and time efficient. A PLC is generally made up of a processor (CPU), memory, input/output interfaces, a power supply, a communication interface, and a programming interface. The programming interface transfers programs to the PLC, whereas the communication device communicates via the network (including with other PLCs and a Master Terminal Unit (MTU)) [14].
- Controller (MC): The robot’s operations are controlled by a control system implemented by a controller, ensuring the safety and security of vital components. It forms the logic and services required for monitoring, communication with the environment, and controlling the mechanical portions of the robot [15].
- Industrial Robots (IR): Industrial robots are mechanical, multi-axis “arms” that are commonly employed for automation, particularly in the industrial industry. All robots are complex automated manipulators that interact extensively with the physical environment. As a result, they are cyber-physical systems (CPSs), which include a variety of hardware and software components such as mechanics, controllers, human-interaction devices, firmware, operating systems, actuators, control logic, and sensors. Furthermore, the greater integration of digital monitoring of physical manufacturing activities connects robots both inside and with external services [16].
- SCADA Level (S)
- Human–Machine Interface (HMI): this is the component of the SCADA system that oversees tracking the system’s condition, adjusting control parameters, and allowing the operator to see reports and historical data in addition to real-time monitoring [17].
- Data Historian (DH): a database containing network-related information (for example, device IP addresses, hardware model, firmware, and historical measurements) [18].
- Engineering professionals utilize the Engineering Workstation (EW) to diagnose and alter control network connections and logic. It also includes tools for downloading new code to controllers [19].
- SCADA’s MTU monitors and manages system communication, acquiring data from RTUs to offer current status information [19].
- Corporate Level (C)
- Process Control Network (PB): it connects S with P.
- Corporate Network (CB): it connects C with S.
2.2. SCADA Communication Systems/Infrastructures
2.3. Attack Graph
- Firmware vulnerability: Faked firmware upgrades are frequently seen on PLCs. For example, reverse engineering techniques can be utilized to investigate the PLC vulnerability before concluding the firmware update validation procedure. The firmware update validation process is then examined for weaknesses that might lead to firmware counterfeiting and refinement. After that, the flaws are used to create a forged firmware sample, which is uploaded and used by a PLC [24].
- Commercial-off-the-shelf (COTS) vulnerability: Organizations all over the world, in every sector and market, are developing networks based on Internet protocols. Furthermore, third-party commercial and open-source software is an important component for these enterprises and the utilities, network infrastructure, and services on which they rely. This means that software issues in COTS software products may quickly generate significant problems for any business [25].
- Social Engineering (SE): This is the term for attacks that mostly target businesses and organizations. The target of these assaults is to get sensitive information by mentally coercing or misleading a person [26].
- Malware Injection (MI): This technique, which may be quite effective, involves tricking the victim into installing or clicking a link to download a file that appears secure but contains a hidden malware installation. The attacker can take control of the machine and use it to sniff data, or they can take full use of every computer connected to the victim’s network [27].
- Pivoting: The attack normally consists of two parts, phishing and malware insertion. Because the corporate network has the most exposure to the Internet, this attack is typically utilized against it. An attacker can gain further access to the system and even the process control domain by breaching this network [28].
- Sniffing (S): In terms of network security, sniffing is the act of stealing or severing data by using a sniffer to record network traffic. For instance, using (Base64 encoding), the attacker sniffs the data surrounding the Historian to get login credentials. By extracting them, the attacker can use specially constructed packets to improve the assaults and create a backdoor on the Historian [29].
- Buffer Overflow (BO): This form of attack makes use of a defect known as a “buffer overflow”, in which software overwrites memory adjacent to a buffer that has not been altered—either purposefully or accidentally. Buffer overflows are typically associated with C-based programming languages that lack bounds checking. As a result, actions like transferring a sequence from one buffer to another may cause extra data to be erased in the memory that borders the new, shorter buffer [30]. This exploit uses a heap buffer overflow to fuzz and destroy the SCADA HMI. Since the operation must be manually repeated to return the system to normal, the SCADA system is greatly impacted by the attack [31].
- Privilege Escalation (PE): This attack uses bugs in programming or design to grant the attacker further access to the network, the data connected to it, and the programs that run on it [32].
- Authentication Bypass (B): A bypass is a security system vulnerability that enables an attacker to go around security measures and gain access to a system or network. The actual point of entry is via a software, hardware device, or even just a small bit of code that grants access to the system to the user despite security clearance processes (like authentication). An attacker-placed technique, a design fault, or a developer-placed alternate access method might all be considered bypasses [33].
- Alteration of Data (AoD): If attackers have access to software, they can carry out this attack. It attacks the memory of the device in an attempt to cause data corruption (such as changing the set point) and slowness in data processing [34].
- Man in the Middle (MITM): A hacker compromises, stymies, or impersonates two systems’ communications. An industrial robot may be blown out of its designated lane and speed restriction by an attacker taking control of a smart actuator, which might harm a meeting place or hurt operators [35].
- Denial of Service (DoS): The purpose of a DoS attack is to either prevent resources from being utilized as intended or to prohibit system access. Since the PLC is the primary hardware component that directly controls the operation, it may be extremely harmful to profit from this part. For example, since the EWs are the primary location where PLC logic is stored, the attacker can utilize them as an entry point to launch a denial-of-service assault [36].
2.4. SARSA
- is the agent’s actions set, ;
- is the agent’s state set, ;
- is the Q-value for ;
- is the reward signal;
- is the control policy in the learning process;
- γ is the discount factor;
- is a time step.
- is the probability that an agent will update among any two consecutive states after conducting a specific behavior;
- is the adjusted state;
- is the adjusted action.
- is the learning rate;
- is the reward gained on leaving state by performing an action ;
- is the Q-value for .
2.5. Common Vulnerability Scoring System (CVSS)
- Attack Vector: Network;
- Attack Complexity: High;
- Privileges Required: High;
- User Interaction: None;
- Scope: Changed;
- Confidentiality Impact: High;
- Integrity Impact: High;
- Availability Impact: High;
- Exploit Code Maturity: Unproven that exploit exists;
- Remediation Level: Unavailable;
- Report Confidence: Unknown;
- Modified Attack Vector (MAV): Network;
- Modified Attack Complexity (MAC): High;
- Modified Privileges Required (MPR): High;
- Modified User Interaction (MUI): None;
- Modified Scope (MS): Changed;
- Modified Confidentiality (MC): High;
- Modified Integrity (MI): High;
- Modified Availability (MA): High;
3. Methodology
Algorithm 1: Predict the optimal route. |
Input: Initial state; Output: Optimal route;
|
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bhamare, D.; Zolanvari, M.; Erbad, A.; Jain, R.; Khan, K.; Meskin, N. Cybersecurity for industrial control systems: A survey. Comput. Secur. 2020, 89, 101677. [Google Scholar] [CrossRef]
- Eden, P.; Blyth, A.; Jones, K.; Soulsby, H.; Burnap, P.; Cherdantseva, Y.; Stoddart, K. SCADA system forensic analysis within IIoT. In Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing; Springer: Cham, Switzerland, 2017; pp. 73–101. [Google Scholar]
- Umer, M.A.; Junejo, K.N.; Jilani, M.T.; Mathur, A.P. Machine learning for intrusion detection in industrial control systems: Applications, challenges, and recommendations. Int. J. Crit. Infrastruct. Prot. 2022, 38, 100516. [Google Scholar] [CrossRef]
- Ibrahim, M.; Al-Hindawi, Q.; Elhafiz, R.; Alsheikh, A.; Alquq, O. Attack graph implementation and visualization for cyber physical systems. Processes 2019, 8, 12. [Google Scholar] [CrossRef]
- Rigas, E.S.; Ramchurn, S.D.; Bassiliades, N. Managing electric vehicles in the smart grid using artificial intelligence: A survey. IEEE Trans. Intell. Transp. Syst. 2014, 16, 1619–1635. [Google Scholar] [CrossRef]
- Orseau, L.; Armstrong, M. Safely interruptible agents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, Association for Uncertainty in Artificial Intelligence, Jersey City, NJ, USA, 25–29 June 2016. [Google Scholar]
- Jin, Z.; Ma, M.; Zhang, S.; Hu, Y.; Zhang, Y.; Sun, C. Secure state estimation of cyber-physical system under cyber attacks: Q-learning vs. SARSA. Electronics 2022, 11, 3161. [Google Scholar] [CrossRef]
- Yan, X.; Yan, K.; Rehman, M.U.; Ullah, S. Impersonation attack detection in mobile edge computing by levering sarsa technique in physical layer security. Appl. Sci. 2022, 12, 10225. [Google Scholar] [CrossRef]
- Hu, Z.; Beuran, R.; Tan, Y. Automated penetration testing using deep reinforcement learning. In Proceedings of the IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Genoa, Italy, 7–11 September 2020; pp. 2–10. [Google Scholar]
- Cengiz, E.; Murat, G.Ö.K. Reinforcement Learning Applications in Cyber Security: A Review. Sak. Univ. J. Sci. 2023, 27, 481–503. [Google Scholar] [CrossRef]
- Mohan, P.; Sharma, L.; Narayan, P. Optimal path finding using iterative Sarsa. In Proceedings of the 5th IEEE International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 811–817. [Google Scholar]
- Wen, S.; Jiang, Y.; Cui, B.; Gao, K.; Wang, F. A hierarchical path planning approach with Multi-SARSA based on topological map. Sensors 2022, 22, 2367. [Google Scholar] [CrossRef]
- Ibrahim, M.; Elhafiz, R. Security Analysis of Cyber-Physical Systems Using Reinforcement Learning. Sensors 2023, 23, 1634. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Zhang, Y.; Chen, Y.; Liu, H.; Wang, B.; Wang, C. A Survey on Programmable Logic Controller Vulnerabilities, Attacks, Detections, and Forensics. Processes 2023, 11, 918. [Google Scholar] [CrossRef]
- Balduzzi, M.; Sortino, F.; Castello, F.; Pierguidi, L. A Security Analysis of CNC Machines in Industry 4.0. In Detection of Intrusions and Malware, and Vulnerability Assessment; Gruss, D., Maggi, F., Fischer, M., Carminati, M., Eds.; DIMVA. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13959, pp. 132–152. [Google Scholar]
- Yankson, B.; Loucks, T.; Sampson, A.; Lojano, C. Robots Security Assessment and Analysis Using Open-Source Tools. In Proceedings of the International Conference on Cyber Warfare and Security, Baltimore County, MD, USA, 9–10 March 2023; Volume 18, pp. 449–456. [Google Scholar]
- Shahzad, A.; Musa, S.; Aborujilah, A.; Irfan, M. The SCADA review: System components, architecture, protocols and future security trends. Am. J. Appl. Sci. 2014, 11, 1418. [Google Scholar] [CrossRef]
- Green, B.; Krotofil, M.; Abbasi, A. On the significance of process comprehension for conducting targeted ICS attacks. In Proceedings of the Workshop on Cyber-Physical Systems Security and PrivaCy, New York, NY, USA, 3 November 2017; pp. 57–67. [Google Scholar]
- Andress, J.; Winterfeld, S. Cyber Warfare: Techniques, Tactics and Tools for Security Practitioners, 2nd ed.; Syngress: Oxford, UK, 2013. [Google Scholar]
- Khelil, A.; Germanus, D.; Suri, N. Protection of SCADA communication channels. In Critical Infrastructure Protection: Information Infrastructure Models, Analysis, and Defense; Lopez, J., Setola, R., Wolthusen, S.D., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7130, pp. 177–196. [Google Scholar]
- Abbas, H.A. Future SCADA challenges and the promising solution: The agent-based SCADA. Int. J. Crit. Infrastruct. 2014, 10, 307–333. [Google Scholar] [CrossRef]
- Ammann, P.; Wijesekera, D.; Kaushik, S. Scalable, graph-based network vulnerability analysis. In Proceedings of the 9th ACM Conference on Computer and Communications Security, Washington, DC, USA, 18–22 November 2002; pp. 217–224. [Google Scholar]
- Wang, L.; Islam, T.; Long, T.; Singhal, A.; Jajodia, S. An attack graph-based probabilistic security metric. In Proceedings of the Data and Applications Security XXII: 22nd Annual IFIP WG 11.3 Working Conference on Data and Applications Security, London, UK, 13–16 July 2008; pp. 283–296. [Google Scholar]
- Basnight, Z.; Butts, J.; Lopez, J., Jr.; Dube, T. Firmware modification attacks on programmable logic controllers. Int. J. Crit. Infrastruct. Prot. 2013, 6, 76–84. [Google Scholar] [CrossRef]
- Martin, R.A. Managing vulnerabilities in your commercial-off-the shelf (COTS) systems using an industry standards effort. In Proceedings of the 21st IEEE Digital Avionics Systems Conference, Irvine, CA, USA, 27–31 October 2002; Volume 1, p. 4A1. [Google Scholar]
- Hinson, G. Social engineering techniques, risks, and controls. EDPAC EDP Audit. Control. Secur. Newsl. 2008, 37, 32–46. [Google Scholar] [CrossRef]
- Sood, A.; Enbody, R. Targeted Cyber Attacks: Multi-Staged Attacks Driven by Exploits and Malware, 1st ed.; Syngress: Oxford, UK, 2014. [Google Scholar]
- Zimba, A. A Bayesian attack-network modeling approach to mitigating malware-based banking cyberattacks. Int. J. Comput. Netw. Inf. Secur. 2022, 14, 25–39. [Google Scholar] [CrossRef]
- Verma, P. Wireshark Network Security, 1st ed.; Packt Publishing Ltd.: Birmingham, UK, 2015. [Google Scholar]
- Gupta, S. Buffer overflow attack. IOSR J. Comput. Eng. 2012, 1, 10–23. [Google Scholar] [CrossRef]
- Sayegh, N.; Chehab, A.; Elhajj, I.H.; Kayssi, A. Internal security attacks on SCADA systems. In Proceedings of the Third IEEE International Conference on Communications and Information Technology (ICCIT), Beirut, Lebanon, 19–21 June 2013; pp. 22–27. [Google Scholar]
- Yamauchi, T.; Akao, Y.; Yoshitani, R.; Nakamura, Y.; Hashimoto, M. Additional kernel observer: Privilege escalation attack prevention mechanism focusing on system call privilege changes. Int. J. Inf. Secur. 2021, 20, 461–473. [Google Scholar] [CrossRef]
- Siu, J.Y.; Kumar, N.; Panda, S.K. Command authentication using multiagent system for attacks on the economic dispatch problem. IEEE Trans. Ind. Appl. 2022, 58, 4381–4393. [Google Scholar] [CrossRef]
- Ibrahim, M.; Alsheikh, A.; Al-Hindawi, Q. Automatic attack graph generation for industrial controlled systems. In Recent Developments on Industrial Control Systems Resilience, 1st ed.; Pricop, E., Fattahi, J., Dutta, N., Ibrahim, M., Eds.; Springer: Cham, Switzerland; Berlin/Heidelberger, Germany, 2020; Volume 255, pp. 99–116. [Google Scholar]
- Bin Muzammil, M.; Bilal, M.; Ajmal, S.; Shongwe, S.C.; Ghadi, Y.Y. Unveiling Vulnerabilities of Web Attacks Considering Man in the Middle Attack and Session Hijacking. IEEE Access 2024, 12, 6365–6375. [Google Scholar] [CrossRef]
- Horak, T.; Strelec, P.; Huraj, L.; Tanuska, P.; Vaclavova, A.; Kebisek, M. The vulnerability of the production line using industrial IoT systems under ddos attack. Electronics 2021, 10, 381. [Google Scholar] [CrossRef]
- Van Otterlo, M. Markov Decision Processes: Concepts and Algorithms. Compiled for the SIKS Course on “Learning and Reasoning”. 2009, pp. 1–23. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=968bab782e52faf0f7957ca0f38b9e9078454afe (accessed on 4 April 2024).
- Rummery, G.A.; Niranjan, M. On-Line Q-Learning Using Connectionist Systems; Department of Engineering, University of Cambridge: Cambridge, UK, 1994; Volume 37, p. 14. [Google Scholar]
- Scheftelowitsch, D. Markov Decision Processes with Uncertain Parameters. Ph.D. Thesis, der Technischen Universität Dortmund an der Fakultät für Informatik, Dortmund, Germany, May 2018. [Google Scholar]
- Sombolestan, S.M.; Rasooli, A.; Khodaygan, S. Optimal path-planning for mobile robots to find a hidden target in an unknown environment based on machine learning. J. Ambient Intell. Humaniz. Comput. 2019, 10, 1841–1850. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [PubMed]
- Barto, A.G. Reinforcement learning: Connections, surprises, and challenge. AI Mag. 2019, 40, 3–15. [Google Scholar] [CrossRef]
- Knox, W.B.; Stone, P. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, Toronto, ON, Canada, 10–14 May 2010; pp. 5–12. [Google Scholar]
- Aljohani, T.M.; Mohammed, O. A Real-Time Energy Consumption Minimization Framework for Electric Vehicles Routing Optimization Based on SARSA Reinforcement Learning. Vehicles 2022, 4, 1176–1194. [Google Scholar] [CrossRef]
- Mell, P.; Scarfone, K.; Romanosky, S. A Complete Guide to the Common Vulnerability Scoring System Version 2.0; FIRST-Forum of Incident Response and Security Teams: Cary, NC, USA, 2007; Volume 1, p. 23. [Google Scholar]
- Singh, U.K.; Joshi, C. Quantitative security risk evaluation using CVSS metrics by estimation of frequency and maturity of exploit. In Proceedings of the World Congress on Engineering and Computer Science, San Francisco, CA, USA, 19–21 October 2016; Volume 1, pp. 19–21. [Google Scholar]
- Mell, P.; Scarfone, K.; Romanosky, S. Common vulnerability scoring system. IEEE Secur. Priv. 2006, 4, 85–89. [Google Scholar] [CrossRef]
- Cheng, Y.; Deng, J.; Li, J.; DeLoach, S.A.; Singhal, A.; Ou, X. Metrics of security. In Cyber Defense and Situational Awareness, 1st ed.; Kott, A., Wang, C., Erbacher, R., Eds.; Springer: Cham, Switzerland; Berlin/Heidelberger, Germany, 2014; Volume 62, pp. 263–295. [Google Scholar]
- National Vulnerability Database. Common Vulnerability Scoring System Calculator. Available online: https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator (accessed on 12 January 2024).
- Aloul, F.; Al-Ali, A.R.; Al-Dalky, R.; Al-Mardini, M.; El-Hajj, W. Smart grid security: Threats, vulnerabilities and solutions. Int. J. Smart Grid Clean Energy 2012, 1, 1–6. [Google Scholar] [CrossRef]
- Chung, J.J.; Lawrance, N.R.; Sukkarieh, S. Learning to soar: Resource-constrained exploration in reinforcement learning. Int. J. Robot. Res. 2015, 34, 158–172. [Google Scholar] [CrossRef]
- Wang, Y.-H.; Li, T.-H.S.; Lin, C.-J. Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng. Appl. Artif. Intell. 2013, 26, 2184–2193. [Google Scholar] [CrossRef]
Attack Name | Base Score | Temporal Score | Environmental Score | Overall Score |
---|---|---|---|---|
SE-APE | 3.8 | 3.4 | 3.2 | 3.2 |
P-APE | 4.0 | 3.8 | 3.4 | 3.4 |
BO-EHMI | 7.8 | 7.2 | 7.0 | 7.0 |
S-EDH | 2.6 | 2.4 | 2.0 | 2.0 |
B-HMIPLC | 8.5 | 8.2 | 8.0 | 8.0 |
PE-HMIEW | 7.2 | 7.2 | 7.0 | 7.0 |
MI-DHMTU | 5.0 | 4.6 | 4.2 | 4.2 |
DoS-EWPLC | 9.2 | 9.0 | 8.7 | 8.7 |
DoS-MTUPLC | 9.2 | 9 | 8.7 | 8.7 |
AoD-PLCMC | 8.0 | 6.7 | 6.8 | 6.8 |
MITM-MCIR | 7.5 | 8 | 10 | 10 |
R | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | −1 | 3.4 | 3.2 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 |
2 | 0 | −1 | −1 | 6.2 | 2.0 | −1 | −1 | −1 | −1 | −1 | −1 |
3 | 0 | −1 | −1 | 7.0 | 2.0 | −1 | −1 | −1 | −1 | −1 | −1 |
4 | −1 | 0 | 0 | −1 | −1 | 7.0 | −1 | 8.0 | −1 | −1 | −1 |
5 | −1 | 0 | 0 | −1 | −1 | −1 | 4.2 | −1 | −1 | −1 | −1 |
6 | −1 | −1 | −1 | 0 | −1 | −1 | −1 | −1 | 8.7 | −1 | −1 |
7 | −1 | −1 | −1 | −1 | 0 | −1 | −1 | −1 | 8.7 | −1 | −1 |
8 | −1 | −1 | −1 | 0 | −1 | −1 | −1 | −1 | −1 | 6.8 | −1 |
9 | −1 | −1 | −1 | −1 | −1 | 0 | 0 | −1 | −1 | 6.8 | −1 |
10 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | 0 | 0 | 10 |
11 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | 0 | −1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ibrahim, M.; Elhafiz, R. Security Assessment of Industrial Control System Applying Reinforcement Learning. Processes 2024, 12, 801. https://doi.org/10.3390/pr12040801
Ibrahim M, Elhafiz R. Security Assessment of Industrial Control System Applying Reinforcement Learning. Processes. 2024; 12(4):801. https://doi.org/10.3390/pr12040801
Chicago/Turabian StyleIbrahim, Mariam, and Ruba Elhafiz. 2024. "Security Assessment of Industrial Control System Applying Reinforcement Learning" Processes 12, no. 4: 801. https://doi.org/10.3390/pr12040801
APA StyleIbrahim, M., & Elhafiz, R. (2024). Security Assessment of Industrial Control System Applying Reinforcement Learning. Processes, 12(4), 801. https://doi.org/10.3390/pr12040801