Optimizing the Effectiveness of Moving Target Defense in a Probabilistic Attack Graph: A Deep Reinforcement Learning Approach
Abstract
:1. Introduction
- (a)
- We create a mathematical formulation of the MTD-DO problem, which is a non-linear optimization problem. Different from existing works, we consider a network scenario where each node can apply an MTD strategy as long as the budget is large enough. Actually, our work can be directly applied to a scenario where some nodes cannot use an MTD by setting the value of the related node to zero, as defined in Section 3. To the best of our knowledge, we are the first to optimize MTD deployment in a scenario with more than one vulnerable node.
- (b)
- We propose two RL-based MTD-DO algorithms, namely, PPO-based and DQN-based algorithms, in our DRL approach. We detail them in Section 4. It is known that for reinforcement learning algorithms, the factors that mainly affect their problem-solving efficiency are the size of the action space and the state space. Assuming there are N nodes, the action space of the constructed MDP is N + 1, and the state space is 3N + 1. Both the action space and state space have a complexity of O(n), which means DRL has good scalability and can handle larger attack graphs with more nodes and connections.
- (c)
- We propose two metrics (see Equations (12) and (13)), which can effectively evaluate MTD-DO algorithms in varying network scale and budget.
2. Background and Related Work
2.1. Background
2.1.1. Attack Graph
2.1.2. Moving Target Defense
- (1)
- Time-based MTD: The MTD operation starts to work periodically. The interval of the two MTD operations is a controllable/adjustable parameter that can be decided by the security administrator.
- (2)
- Event-based MTD: When the detection of an attack is successful, the MTD operation starts to work in order to protest the target from the later potential attacks.
- (3)
- Hybrid MTD: It is a combination of time-based and event-based strategies to achieve a tradeoff among security level, service performance, the caused overheads, and so on.
2.2. Related Work
3. MTD Deployment Optimization Problem
3.1. Problem Description
3.2. Markov Decision Process
- State space at time step includes V, denoting the value of all nodes, denoting the set of successful exploitation probability of all nodes being compromised, the set of deployment cost of all nodes, and the residual budget .
- Action space means that at time step , a backup component can be selected for MTD deployment. It is worth noting that the action will not be executed when the remaining budget is insufficient to support the deployment.
- State transition: After selecting a backup component for performing MTD, the penetration of the system as well as the remaining budget will change.
- Reward function: To minimize the possible security loss value of all nodes, we define the reward at time step as , and indicates the possible security loss value of all nodes at time step . If the agent chooses an action that violates constraints, i.e., exceeding the remaining budget, the agent will receive a large penalty.
4. RL-Based MTD-DO Approach
4.1. DQN-Based Algorithm
Algorithm 1 DQN-Based Algorithm | |
Input: Node information , , , and total budget | |
Output: The optimal backup deployment policy | |
Initialize the experience pool Initialize the parameters of and , respectively | |
1: | For do: |
2: | For do: |
3: | Use probability to choose a random action |
4: | Otherwise choose |
5: | Execute , observe , next state and whether done |
6: | Add transition to |
7: | Sample random minibatch of transitions from |
8: | Set |
9: | Perform a gradient descent step with loss function |
10: | Update |
11: | End for |
12: | End for |
4.2. PPO-Based Algorithm
Algorithm 2 PPO-Based Algorithm | |
Input: Node information , , , and total budget | |
Output: The optimal backup deployment policy | |
1: | For do: |
2: | Reset the environment |
3: | do: |
4: | and execute |
5: | and whether done |
6: | If done, reset the environment and continue |
7: | according to Equation (10) |
8: | End for |
9: | For k = 1, 2, , K do |
10: | Update and separately according to Equations (9) and (11) |
11: | End for |
12: | End for |
5. Experiment Evaluation
5.1. Experiment Setting
5.2. Training Details
5.3. Performance Comparison
- (1)
- Q-learning algorithm. Q-learning is a classic model-free reinforcement learning algorithm that generates an optimal policy by maintaining a Q-table composed of state–action pairs.
- (2)
- Random algorithm. It is a simple heuristic algorithm that randomly selects a node for deployment, excluding under-budgeted nodes.
5.4. Discussion
- (1)
- (2)
- The budget should be appropriately increased when the number of nodes increases. As the number of nodes in the attack graph increases, the impact of these backup components will be weakened. See Figure 4. When the number of nodes varies from 8 to 9, there is large security increase, but from 15 to 16, the increase is small.
- (3)
- DRL-based algorithms work well over various scenarios. Scalability is an important metric for evaluating an algorithm. In the scenario investigated in this study, both the attack graph size and the budget affect the problem size. The larger the budget, the more backup components. Figure 4 is about the attack graph size, and Figure 5 is about the budget amount. These results are illustrated in Figure 4g,h.
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Pagnotta, G.; De Gaspari, F.; Hitaj, D.; Andreolini, M.; Colajanni, M.; Mancini, L.V. DOLOS: A Novel Architecture for Moving Target Defense. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5890–5905. [Google Scholar] [CrossRef]
- Rehman, Z.; Gondal, I.; Ge, M.; Dong, H.; Gregory, M.A.; Tari, Z. Proactive defense mechanism: Enhancing IoT security through diversity-based moving target defense and cyber deception. Comput. Secur. 2024, 139, 103685. [Google Scholar] [CrossRef]
- Tech, G.E. Security—Tech Innovators in Automated Moving Target Defense; Pohto, M., Manion, C., Eds.; Gartner: Singapore, 2023. [Google Scholar]
- Ma, H.; Han, S.; Kamhoua, C.A.; Fu, J. Optimizing Sensor Allocation Against Attackers with Uncertain Intentions: A Worst-Case Regret Minimization Approach. IEEE Control Syst. Lett. 2023, 7, 2863–2868. [Google Scholar] [CrossRef]
- Yoon, S.; Cho, J.-H.; Kim, D.S.; Moore, T.J.; Free-Nelson, F.; Lim, H. Attack Graph-Based Moving Target Defense in Software-Defined Networks. IEEE Trans. Netw. Serv. Manag. 2020, 17, 1653–1668. [Google Scholar] [CrossRef]
- Javadpour, A.; Ja, F.; Taleb, T.; Shojafar, M.; Yang, B. SCEMA: An SDN-Oriented Cost-Effective Edge-Based MTD Approach. IEEE Trans. Inf. Forensics Secur. 2023, 18, 667–682. [Google Scholar] [CrossRef]
- Sun, F.; Zhang, Z.; Chang, X.; Zhu, K. Toward Heterogeneous Environment: Lyapunov-Orientated ImpHetero Reinforcement Learning for Task Offloading. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1572–1586. [Google Scholar] [CrossRef]
- Zhang, T.; Xu, C.; Lian, Y.; Tian, H.; Kang, J.; Kuang, X.; Niyato, D. When Moving Target Defense Meets Attack Prediction in Digital Twins: A Convolutional and Hierarchical Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2023, 41, 3293–3305. [Google Scholar] [CrossRef]
- MRibeiro, A.; Fonseca, M.S.P.; de Santi, J. Detecting and mitigating DDoS attacks with moving target defense approach based on automated flow classification in SDN networks. Comput. Secur. 2023, 134, 103462. [Google Scholar] [CrossRef]
- Celdrán, A.H.; Sánchez, P.M.S.; von der Assen, J.; Schenk, T.; Bovet, G.; Pérez, G.M.; Stiller, B. RL and Fingerprinting to Select Moving Target Defense Mechanisms for Zero-Day Attacks in IoT. IEEE Trans. Inf. Forensics Secur. 2024, 19, 5520–5529. [Google Scholar] [CrossRef]
- Zhou, Y.; Cheng, G.; Ouyang, Z.; Chen, Z. Resource-Efficient Low-Rate DDoS Mitigation with Moving Target Defense in Edge Clouds. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6377–6392. [Google Scholar] [CrossRef]
- Li, L.; Ma, H.; Han, S.; Fu, J. Synthesis of Proactive Sensor Placement in Probabilistic Attack Graphs. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May–2 June 2023; pp. 3415–3421. [Google Scholar]
- Ghourab, E.M.; Naser, S.; Muhaidat, S.; Bariah, L.; Al-Qutayri, M.; Damiani, E.; Sofotasios, P.C. Moving Target Defense Approach for Secure Relay Selection in Vehicular Networks. Veh. Commun. 2024, 47, 100774. [Google Scholar] [CrossRef]
- Mnih, V. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Kang, H.; Chang, X.; Misic, J.V.; Misic, V.B.; Fan, J.; Liu, Y. Cooperative UAV Resource Allocation and Task Offloading in Hierarchical Aerial Computing Systems: A MAPPO-Based Approach. IEEE Internet Things J. 2023, 10, 10497–10509. [Google Scholar] [CrossRef]
- Zenitani, K. Attack graph analysis: An explanatory guide. Comput. Secur. 2023, 126, 103081. [Google Scholar] [CrossRef]
- Cho, J.-H.; Sharma, D.P.; Alavizadeh, H.; Yoon, S.; Ben-Asher, N.; Moore, T.J.; Kim, D.S.; Lim, H.; Nelson, F.F. Toward proactive, adaptive defense: A survey on moving target defense. IEEE Commun. Surveys Tuts. 2020, 22, 709–745. [Google Scholar] [CrossRef]
- Chang, X.; Shi, Y.; Zhang, Z.; Xu, Z.; Trivedi, K.S. Job Completion Time Under Migration-Based Dynamic Platform Technique. IEEE Trans. Serv. Comput. 2022, 15, 1345–1357. [Google Scholar] [CrossRef]
- Chen, Z.; Chang, X.; Han, Z.; Yang, Y. Numerical Evaluation of Job Finish Time Under MTD Environment. IEEE Access 2020, 8, 11437–11446. [Google Scholar] [CrossRef]
- Santos, L.; Brito, C.; Fé, I.; Carvalho, J.; Torquato, M.; Choi, E.; Lee, J.-W.; Nguyen, T.A.; Silva, F.A. Event-Based Moving Target Defense in Cloud Computing with VM Migration: A Performance Modeling Approach. IEEE Access 2024. [Google Scholar] [CrossRef]
- Nguyen, M.; Samanta, P.; Debroy, S. Analyzing Moving Target Defense for Resilient Campus Private Cloud. In Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018; pp. 114–121. [Google Scholar]
- Tan, J.; Jin, H.; Hu, H.; Hu, R.; Zhang, H.; Zhang, H. WF-MTD: Evolutionary Decision Method for Moving Target Defense Based on Wright-Fisher Process. IEEE Trans. Dependable Secur. Comput. 2023, 20, 4719–4732. [Google Scholar] [CrossRef]
- Umsonst, D.; Saritas, S.; Dán, G.; Sandberg, H. A Bayesian Nash Equilibrium-Based Moving Target Defense Against Stealthy Sensor Attacks. IEEE Trans. Autom. Control 2024, 69, 1659–1674. [Google Scholar] [CrossRef]
- Singhal, A.; Ou, X. Security Risk Analysis of Enterprise Networks Using Probabilistic Attack Graphs. In Network Security Metrics; Springer: Gaithersburg, MD, USA, 2017. [Google Scholar]
- Haque, M.A.; Shetty, S.; Kamhoua, C.A.; Gold, K. Integrating Mission-Centric Impact Assessment to Operational Resiliency in Cyber-Physical Systems. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–7. [Google Scholar]
Notation | Definition |
---|---|
n | Number of nodes |
Value of node | |
Cost of deploying one backup component at node | |
Total budget | |
The set of node values | |
The set of node backup deployment costs | |
The actual probability of successful exploitation of node | |
The given probability of successful exploitation of node | |
The number of backup components at node | |
Nodes without antecedent nodes | |
The set of antecedent nodes of node |
Target Type | Number of Nodes | Antecedent Node Set |
---|---|---|
Single | 8 | [[0], [0], [1, 2], [3], [0], [0], [4, 5, 6], [7]] |
15 | [[0], [0], [1, 2], [3], [0], [0], [4, 5, 6], [7], [0], [8, 9], [10], [0], [0], [11, 12, 13], [14]] | |
Multiple | 9 | [[0], [0], [1, 2], [3], [0], [0], [4, 5, 6], [7], [4, 7]] |
16 | [[0], [0], [1, 2], [3], [0], [0], [4, 5, 6], [7], [0], [8, 9], [10], [0], [0], [11, 12, 13], [14], [9, 11]] |
Shared training parameters | Value |
Maximum time steps per episode | 500 |
Total training steps | 250,000 |
Discount factor | 0.99 |
Number of hidden layers of DNNs | 2 |
Hidden layer size | 64 |
Parameters of PPO | Value |
Learning rates of actor network and critic network | 0.0001 |
Clip factor | 0.2 |
Number of epochs | 20 |
Parameters of DQN | Value |
Learning rate | 0.001 |
Batch size | 64 |
Replay memory size | 250,000 |
Parameters of Q-learning | Value |
Learning rate | 0.0001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Q.; Wu, J. Optimizing the Effectiveness of Moving Target Defense in a Probabilistic Attack Graph: A Deep Reinforcement Learning Approach. Electronics 2024, 13, 3855. https://doi.org/10.3390/electronics13193855
Li Q, Wu J. Optimizing the Effectiveness of Moving Target Defense in a Probabilistic Attack Graph: A Deep Reinforcement Learning Approach. Electronics. 2024; 13(19):3855. https://doi.org/10.3390/electronics13193855
Chicago/Turabian StyleLi, Qiuxiang, and Jianping Wu. 2024. "Optimizing the Effectiveness of Moving Target Defense in a Probabilistic Attack Graph: A Deep Reinforcement Learning Approach" Electronics 13, no. 19: 3855. https://doi.org/10.3390/electronics13193855
APA StyleLi, Q., & Wu, J. (2024). Optimizing the Effectiveness of Moving Target Defense in a Probabilistic Attack Graph: A Deep Reinforcement Learning Approach. Electronics, 13(19), 3855. https://doi.org/10.3390/electronics13193855