Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach
Abstract
1. Introduction
- (i)
- Existing methods remain inadequate for detecting and identifying APT attacks. While current detection technologies have made significant strides in accuracy, speed, and reducing sample dependency, they still largely rely on prior knowledge. This makes accurate detection of sophisticated attacks—such as memory-resident APT attacks that exploit zero-day vulnerabilities—particularly challenging. Furthermore, attack strategies can adapt in real-time to the defense landscape, rendering post-detection countermeasures highly difficult. Another critical issue is the lack of redundancy in defense strategies. A missed detection can cause system compromise and then chaos at passenger stations.
- (ii)
- Traditional defense strategies centered on attack identification incur high costs throughout their deployment lifecycle, rendering large-scale investment in a single system economically challenging. The inherent stealth of APT attacks means that available attack samples targeting railway internal networks are scarce, while the process of creating such samples is itself resource intensive. Moreover, the operational constraint characteristics of industrial control network deployments further elevate the maintenance costs associated with real-time updates to attack sample databases. As a result, substantial investment in a tailored defense system for an individual platform is impractical.
- (1)
- We establish the FlipIt-game-based problem formulation of optimizing DDHR defender’s payoff under incomplete information. It applies information entropy of DDHR redundant executors to reflect attacking and defending behaviors, detailed in Section 3. By transforming the defense against memorization-based attacks into a problem of time allocation for public resources and changes in information entropy metrics, it achieves payoff quantification for redundant executor scheduling strategies against memorization-based attacks. In the DDHR architecture model, all redundant executors performing computations can be considered as public resources.
- (2)
- We propose a method of estimating attacking time in order to overcome the difficulty in determining scheduling time due to incomplete information, detailed in Section 3.2. This method first defines some special executor points (denoted as anchor in the rest of the paper) in order to introduce heterogeneity to DDHR redundant executors. By employing a predefined scheduling strategy for redundant executors with special heterogeneity and then estimating the attack time of the anchor point redundant executor, the difficulty of estimating the executor scheduling time is overcome. That is, this addresses the challenge of setting redundant executor scheduling times when memorization-based attacks remain undetected or detection is intermittent.
- (3)
- We propose the PPO_EH approach—a Proximal Policy Optimization (PPO) algorithm enhanced with quantifiable information Entropy and Heterogeneity of DDHR redundant executors. We not only detail PPO_EH including Markov decision process of the scheduling problem and the scheduling steps in Section 4.2 and Section 4.3 but also present the DDHR scheduling framework deploying PPO_EH in Section 4.1.
2. Related Work
2.1. Research on APT Attack Defense Strategies for Integrated AI Technology
- (1)
- Reconnaissance stage. Attackers use social engineering techniques and open-source intelligence tools to gather as much information as possible about the organization’s technical environment (such as routers, firewalls, etc.) and the background of key personnel (such as social activities, frequently visited websites, etc.) to identify potential attack vectors. Once intelligence gathering reaches the desired level, attackers begin planning attack strategies while preparing the necessary tools.
- (2)
- Establishing a foothold. Attackers bypass system defense using various technical means such as malware, web application vulnerabilities, and spear phishing, while establishing command-and-control (C&C) channels to deploy subsequent attack operations.
- (3)
- Lateral movement. Attackers hide within legitimate traffic and normal behaviors to avoid detection while further expanding their control privileges.
- (4)
- Data exfiltration. Attackers seize opportunities to steal confidential information or disrupt target systems. Once the target attack is successful, attackers export data to remote servers via C&C channels.
- (5)
- Disappearance. When attackers obtain the desired data, they may choose to terminate the attack or continue to lie dormant. Furthermore, recent studies indicate that attackers may attempt to erase attack traces after a successful attack, such as deleting log records and clearing temporary files, to prevent their actions from being tracked and traced.
2.2. Research on DHR Architecture Deployment and Scheduling Strategies in Cloud Environments
- (1)
- Existing research lacks studies on security defense architectures under conditions of incomplete attack information. Currently, research on active security defense architectures such as the DHR architecture typically assumes real-time detection of attack information when developing response strategies. However, in practical engineering scenarios, even with AI-assisted detection of APT attacks, numerous attack behaviors remain untraceable in real time. This prevents security defense architectures from effectively implementing their response strategies.
- (2)
- Research on developing intelligent defense strategies under unified, quantifiable environmental conditions remains scarce. Currently, studies on active safety defense architectures—such as DHR frameworks assisted by machine learning algorithms—often lack a standardized, quantifiable environment, resulting in methodological shortcomings in both training processes and performance evaluation.
3. Problem Formulation of Optimizing Defender’s Payoff Under Incomplete Information
3.1. Attack–Defense Payoff Model with Under Incomplete Information
- (1)
- includes the double-layer redundant executor set .
- (2)
- represents the set of time intervals when both the attacking and defending sides occupy public resources. represents the set of attacking time, and represents the time when the -th attack is successful and the attacking side occupies public resources. Similarly, represents the set of times for the defending side, and represents the time when the -th defense is successful and the defending side occupies public resources.
- (3)
- represents the game strategy space, with the attacker’s strategy being and the defender’s strategy being . represents an attack on redundant executor , and represents the defender actively scheduling a defense against redundant executor .
- (4)
- represents the payoff of the attacker and the defender, respectively.
3.2. Method for Estimating Scheduling Time
3.2.1. Attacking Strategy
- Step 1: Randomly select a node from the lower-layer redundant executor node set to conduct attack.
- Step 2: After the node attack is successful, select the reachable neighboring nodes of the node to conduct detection and obtain the edge weight information .
- Step 3: Conduct attacks on the target node using edges sorted in ascending order of weights, repeating Step 2 until the attack fails.
- Step 4: If all attack nodes are successfully attacked, stop the attack.
3.2.2. Heterogeneous Redundant Executor Scheduling Strategy
| Algorithm 1 Heterogeneous Redundant Executor Scheduling |
| INPUT: Redundancy resource pool ; redundancy ; redundant resource pool similarity matrix Smatix[No]nxn; total redundancy of redundant executor in operation ; upper-layer redundant executor set redundancy ; lower-layer redundant executor set redundancy . The upper-layer execution redundancy executor set is , the lower-layer execution redundancy executor set is , and the num() function indicates the current set count.
OUTPUT: Total set of redundant executors Vr. while i in do: // Initialization scheduling
|
3.2.3. Estimation of Redundant Executor Scheduling Time
| Algorithm 2: Attack Time Estimation Method |
| INPUT: The lower anchor point was first detected to be attacked at time , and the upper anchor point was first detected to be attacked at time ; the redundancy degree of the redundancy executors set is ; the fixed scheduling time threshold is ; the distribution function of the attack on time T.
OUTPUT: Set T′ of approximate estimates of redundant executor scheduling time T.
|
3.3. Optimal Defender’s Payoff Model Under Approximately Complete Information
- (1)
- As the initiator of the game, the attacker’s payoff cannot be negative to ensure the existence of an equilibrium solution, i.e., it must satisfy Equation (13).
- (2)
- The defender’s payoff must not fall below the minimum payoff required for the DDHR architecture to take effect, meaning the defender’s payoff must be larger than the sum of the defense payoff of any redundant executors, satisfying Equation (14).
- (3)
- We set the number of heterogenous redundant anchor points to . The entropy of the heterogeneous anchor points must be larger than zero to ensure that the scheduling time is accurately estimated during each scheduling, thereby satisfying Equation (15).
4. PPO-Based Approach for Optimizing DDHR Redundant Executor Scheduling
4.1. DDHR Scheduling Framework
- (1)
- State Collection Module: Real-time connection to the redundant executor scheduling strategy gateway, primarily responsible for adjusting scheduling strategies, receiving information entropy changes in redundant executors under the current strategy, and calculating the current strategy stage payoff function, then storing the results in the experience pool.
- (2)
- Strategy Network Module: Real-time connection to the state collection module, receiving state changes and outputting corresponding actions. Simultaneously, connected to the optimizer to update strategy parameter .
- (3)
- Evaluation Network Module: Real-time connection to the state collection module, receiving complete trajectories from multiple stages. Then, it calculates the advantage function to update the strategy network. Simultaneously, it connects to the optimizer and updates the evaluation network parameters .
- (4)
- Optimizer Module: Real-time connection to the strategy network module and evaluation network module, updating strategy parameters and evaluation network parameters .
4.2. Modeling Markov Decision Processes for FlipIt-DDHR Scheduling Decision Problem
4.2.1. State Space
4.2.2. Action Space
4.2.3. Reward Function
4.3. PPO_EH Approach
| Algorithm 3 PPO_EH Approach |
| INPUT: According to the attacker strategy and defender scheduling strategy environment in Section 3.2, set hyperparameters and .
OUTPUT: Updated network parameters , .
|
4.3.1. Initialize the Information Entropy Environment
4.3.2. Motion Sampling
4.3.3. Calculate the Advantage Function
4.3.4. Update the Policy Network
4.3.5. Updating the Critic Network
5. Experiment Evaluation
5.1. Initialize Environment Settings
5.1.1. Initialization of Information Entropy of Redundant Executor Set
- (1)
- Initialization of Redundancy Resource Pool
- (2)
- Initialization of Redundant Executor Similarity Weight
- (3)
- Initialization of Heterogeneous Redundant executor Anchor points
- (4)
- Initialization of Redundant Executor Scheduling Time
5.1.2. Initializing Experimental Parameters
5.2. Training Experiment Analysis
5.2.1. Average Rewards for PPO-EH Approach Under Different Learning Rate
5.2.2. Average Rewards for PPO-EH Approach Under Different Clip Ratio
5.2.3. Average Rewards for PPO-EH Approach Under Different Discount Factor
5.2.4. Average Rewards for PPO-EH Approach Under Different Weight Parameters of the Reward Function
5.2.5. Comparison of Different Approaches
5.3. Analysis of the Capability of PPO-EH Approach
5.3.1. Analysis in Dynamic Experiment
5.3.2. Analysis of Online Redundant Executors Set Average Scheduling Efficiency
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xin, Q.A. Cybersecurity Threat Report 2024. Available online: https://www.qianxin.com/threat/reportdetail?report_id=335 (accessed on 14 October 2025).
- Global Advanced Persistent Threat (APT) Situation Report. Available online: https://www.secrss.com/articles/54098 (accessed on 14 October 2025).
- Xin, Q.A. 2024 Artificial Intelligence Security Report. Available online: https://www.xdyanbao.com/doc/aec8ito71y (accessed on 14 October 2025).
- Ahmed, Y.; Asyhari, A.; Rahman, M.A. A Cyber Kill Chain Approach for Detecting Advanced Persistent Threats. Comput. Mater. Contin. 2021, 67, 2497–2513. [Google Scholar] [CrossRef]
- Nur Ilzam, C.; Norziana, J.; Yunus, Y.; Miss Laiha, M. A systematic literature review on advanced persistent threat behaviors and its detection strategy. J. Cybersecur. 2024, 10, tyad023. [Google Scholar] [CrossRef]
- Mutalib, N.H.A.; Sabri, A.Q.M.; Wahab, A.W.A.; Abdullah, E.R.M.F.; AlDahoul, N. Explainable deep learning approach for advanced persistent threats (APTs) detection in cybersecurity: A review. Artif. Intell. Rev. 2024, 57, 297. [Google Scholar] [CrossRef]
- Kumaresan, S.J.; Senthilkumar, C.; Kongkham, D.; Beenarani, B.B.; Nirmala, P. Investigating the Effectiveness of Recurrent Neural Networks for Network Anomaly Detection. In Proceedings of the International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), Bangalore, India, 24–25 January 2024. [Google Scholar] [CrossRef]
- Aly, A.; Iqbal, S.; Youssef, A.; Mansour, E. MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning. IEEE Trans. Inf. Forensics Secur. 2024, 19, 5257–5271. [Google Scholar] [CrossRef]
- Zheng, Y.; Li, Z.; Xu, X.; Zhao, Q. Dynamic defenses in cyber security: Techniques, methods and challenges. Digit. Commun. Netw. 2022, 8, 422–435. [Google Scholar] [CrossRef]
- Wu, J. DHR Architecture. In Cyberspace Mimic Defense; Springer: Berlin/Heidelberg, Germany, 2020; Volume 2, pp. 273–337. [Google Scholar]
- Chen, Y.; Li, M.; Zhu, X.; Fang, K.; Ren, Q.; Guo, T.; Chen, X.; Li, C.; Zou, Z.; Deng, Y. An improved algorithm for practical byzantine fault tolerance to large-scale consortium chain. Inf. Process. Manag. 2022, 59, 102884. [Google Scholar] [CrossRef]
- Zhan, Y.; Wang, B.; Lu, R.; Yu, Y. DRBFT: Delegated randomization Byzantine fault tolerance consensus protocol for blockchains. Inf. Sci. 2021, 559, 8–21. [Google Scholar] [CrossRef]
- Zhang, J.; Rong, Y.; Cao, J.; Wu, W. DBFT: A Byzantine Fault Tolerance Protocol With Graceful Performance Degradation. IEEE Trans. Dependable Secur. Comput. 2022, 19, 3387–3400. [Google Scholar] [CrossRef]
- Cai, M.; He, X.; Zhou, D. Self-Healing Fault-Tolerant Control for High-Order Fully Actuated Systems Against Sensor Faults: A Redundancy Framework. IEEE Trans. Cybern. 2024, 54, 2628–2640. [Google Scholar] [CrossRef]
- Reghenzani, F.; Guo, Z.; Fornaciari, W. Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions. ACM Comput. Surv. 2023, 55, 306. [Google Scholar] [CrossRef]
- Lakshminaraytana, S.; Yau, D. Cost-benefit analysis of moving target defense in power grids. IEEE Trans. Power Syst. 2020, 36, 1152–1163. [Google Scholar] [CrossRef]
- Navas, R.E.; Cuppens, F.; Boulahia Cuppens, N.; Toutain, L.; Papadopoulos, G.Z. MTD, Where Art Thou? A Systematic Review of Moving Target Defense Techniques for IoT. IEEE Internet Things J. 2021, 8, 7818–7832. [Google Scholar] [CrossRef]
- Wu, X.; Wang, M.; Shen, J.; Gong, Y. Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System. Electronics 2024, 13, 3592. [Google Scholar] [CrossRef]
- Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
- Yang, M.; Wang, J.; Liu, B. APT Defense in the New Era: Challenges and Strategies Research. Commun. Technol. 2025, 58, 448–456. [Google Scholar] [CrossRef]
- Patel, D.; Rajesh, T.; Balamurugan, G. Enhancing Cybersecurity Vigilance with Deep Learning for Malware Detection. In Proceedings of the 10th International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 12–14 April 2024. [Google Scholar] [CrossRef]
- Tadesse, Y.E.; Choi, Y.-J. Pattern Augmented Lightweight Convolutional Neural Network for Intrusion Detection System. Electronics 2024, 13, 932. [Google Scholar] [CrossRef]
- Xu, C.; Shen, J.; Du, X. A Method of Few-Shot Network Intrusion Detection Based on Meta-Learning Framework. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3540–3552. [Google Scholar] [CrossRef]
- Yin, X.; Fang, W.; Liu, Z.; Liu, D. A novel multi-scale CNN and Bi-LSTM arbitration dense network model for low-rate DDoS attack detection. Sci. Rep. 2024, 14, 5111. [Google Scholar] [CrossRef]
- Xu, B.; Sun, L.; Mao, X.; Liu, C.; Ding, Z. Strengthening Network Security: Deep Learning Models for Intrusion Detection with Optimized Feature Subset and Effective Imbalance Handling. Comput. Mater. Contin. 2024, 78, 1995–2022. [Google Scholar] [CrossRef]
- Yang, Z.; Ma, Z.; Zhao, W.; Li, L.; Gu, F. HRNN: Hypergraph Recurrent Neural Network for Network Intrusion Detection. J. Grid Comput. 2024, 22, 2. [Google Scholar] [CrossRef]
- Hasan, M.M.; Islam, M.U.; Uddin, J. Advanced Persistent Threat Identification with Boosting and Explainable AI. SN Comput. Sci. 2023, 4, 271. [Google Scholar] [CrossRef]
- Li, H.; Zhu, T.; Ying, J.; Chen, T.; Lv, M.; Mei, J.; Weng, Z.; Shi, L. MIRDETECTOR: Applying malicious intent representation for enhanced APT anomaly detection. Comput. Secur. 2025, 157, 104588. [Google Scholar] [CrossRef]
- Chen, J.; Lan, X.; Zhang, Q.; Ma, W.; Fang, W.; He, J. Defending Against APT Attacks in Cloud Computing Environments Using Grouped Multiagent Deep Reinforcement Learning. IEEE Internet Things J. 2025, 12, 19459–19470. [Google Scholar] [CrossRef]
- Huda, S.; Miah, S.; Hassan, M.M.; Islam, R.; Yearwood, J.; Alrubaian, M.; Almogren, A. Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data. Inf. Sci. 2017, 379, 211–228. [Google Scholar] [CrossRef]
- Zhou, M.; Han, L.; Che, X. Strengthening edge defense: A differential game-based edge intelligence strategy against APT attacks. Comput. Secur. 2025, 157, 104580. [Google Scholar] [CrossRef]
- Phan, T.V.; Bauschert, T. DeepAir: Deep Reinforcement Learning for Adaptive Intrusion Response in Software-Defined Networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2207–2218. [Google Scholar] [CrossRef]
- Cao, Y.; Liu, K.; Lin, Y.; Wang, L.; Xia, Y. Deep-Reinforcement-Learning-Based Self-Evolving Moving Target Defense Approach Against Unknown Attacks. IEEE Internet Things J. 2024, 11, 33027–33039. [Google Scholar] [CrossRef]
- Yoon, S.; Cho, J.-H.; Kim, D.S.; Moore, T.J.; Free-Nelson, F.; Lim, H. DESOLATER: Deep Reinforcement Learning-Based Resource Allocation and Moving Target Defense Deployment Framework. IEEE Access 2021, 9, 70700–70714. [Google Scholar] [CrossRef]
- Sepczuk, M. Dynamic Web Application Firewall Detection supported by Cyber Mimic. J. Netw. Comput. Appl. 2023, 213, 103596. [Google Scholar] [CrossRef]
- Wu, Q.; Wu, C.; Yan, X.; Cheng, Q. Intrinsic Security and Self-Adaptive Cooperative Protection Enabling Cloud Native Network Slicing. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1287–1304. [Google Scholar] [CrossRef]
- Wang, Z.; Jiang, D.; Lv, Z. AI-Assisted Trustworthy Architecture for Industrial IoT Based on Dynamic Heterogeneous Redundancy. IEEE Trans. Ind. Inform. 2023, 19, 2019–2027. [Google Scholar] [CrossRef]
- Li, Y.; Liu, Q.; Zhuang, W.; Zhou, Y.; Cao, C.; Wu, J. Dynamic Heterogeneous Redundancy-Based Joint Safety and Security for Connected Automated Vehicles. IEEE Veh. Technol. Mag. 2023, 18, 89–97. [Google Scholar] [CrossRef]
- Chen, Z.; Cui, G.; Zhang, L.; Yang, X.; Li, H.; Zhao, Y.; Ma, C.; Sun, T. Optimal Strategy for Cyberspace Mimic Defense Based on Game Theory. IEEE Access 2021, 9, 68376–68386. [Google Scholar] [CrossRef]
- Shi, L.; Miao, Y.; Ren, J.; Liu, R. Game Analysis and Optimization for Evolutionary Dynamic Heterogeneous Redundancy. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4186–4197. [Google Scholar] [CrossRef]
- Hu, J.; Li, Y.; Li, Z.; Liu, Q.; Wu, J. Unveiling the Strategic Defense Mechanisms in Dynamic Heterogeneous Redundancy Architecture. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4912–4926. [Google Scholar] [CrossRef]
- Kang, Y.; Zhang, Q.; Jiang, B.; Bu, Y. A Differentially Private Framework for the Dynamic Heterogeneous Redundant Architecture System in Cyberspac. Electronics 2024, 13, 1805. [Google Scholar] [CrossRef]
- Wu, X.; Wang, M.; Cai, Y.; Chang, X.; Liu, Y. Improving the CRCC-DHR Reliability: An Entropy-Based Mimic-DefenseResource Scheduling Algorithm. Entropy 2025, 27, 208. [Google Scholar] [CrossRef] [PubMed]

















| Architecture Name | Redundancy Capacity | Fault Tolerance | Dynamic | Applicable Environment |
|---|---|---|---|---|
| DHR | High | High | √ | Centralized |
| BFT | Mid | High | √ | Distributed |
| FTR | Mid | Low | √ | Centralized or Distributed |
| Symbol | Definition |
|---|---|
| The game strategy | |
| Advantage function | |
| Total revenue from the occupation of public resources | |
| Attack cost ≥ 0 | |
| Set of node attack states, the -th node in attack state | |
| Defend cost ≥ 0 | |
| Weight set of connecting edges between redundant executors, the -th and -th redundant executor connectivity weights | |
| Attacker’s base gain | |
| The -th information entropy metrics of public resources in time period | |
| Total information entropy metrics of public resources in the -th stage of time period | |
| Decay of information entropy metrics of public resources in the -th stage of time period | |
| Decay rate of information entropy metrics of public resources in the -th stage of time period | |
| Function of redundant executor average similarity affecting attack success probability in DHR Architecture | |
| DDHR architecture attack graph network model | |
| When using the clip method, the loss function of the policy network | |
| The loss function of the critic network | |
| The -th anchor redundant executor | |
| Total number of upper/lower layer redundant executors | |
| , , , | Denote all participants in the game, the attackers, the defenders, and neutral resources, respectively |
| Total number of redundant executors in the resource pool | |
| Probability of a successful attack on redundant executor during time period | |
| State transition probability function for transitioning to the next stage | |
| Optimal payoff function for the defending side | |
| State of equilibrium | |
| Denote the executor scheduling time and attack time, respectively | |
| Scheduling Cycle for Redundant executors | |
| The time period required to occupy resources, | |
| Trajectory of executing scheduling strategy | |
| , | Represent the attacker’s and defender’s payoff, respectively |
| Redundant executor node set, the -th redundant executor node | |
| Z | FlipIt game model |
| The coefficient of the proportion of total revenue occupied by the attacker during time period | |
| Correlation function between revenue and information entropy metrics | |
| Correlation coefficient between revenue and information entropy indicators | |
| Number of samples for discretization of information entropy metrics | |
| Attack distribution function on time interval | |
| Discount factor for status transfer | |
| Weight adjustment function for redundant executor adjudication | |
| Redundant executor scheduling function | |
| Setting action function for heterogeneous redundant executor anchor similarity | |
| , | Parameters of policy network and critic network |
| Scheduling policy |
| CPU | Intel Core i7 Ultra, Intel Corporation, Penang, Malaysia |
| Memory | 64GB DDR5, frequency 4800 MHz and timing CL40, SK Hynix, Wuxi, China |
| Motherboard | Alienware Model 0P4K4P (Chipset: Intel HM770), Foxconn, Shenzhen, China |
| BIOS version | 1.8.0 |
| Storage | 2 TB Samsung PM9A1 NVMe M.2 SSD (Interface: PCIe 4.0 x4), Samsung, Hwaseong, South Korea |
| Operating system | Microsoft Windows 11 Pro, Version 23H2 (OS Build 22631.3447) |
| Software environment | Python 3.13 + PyTorch 2.0.1 |
| Parameters | Value |
|---|---|
| Total number of episodes | 1.2 × 104, 1.4 × 104 |
| Number of slots in each episodes | 1000 |
| Actor and critic learning rates | 1 × 10−3, 5 × 10−3, 1 × 10−4, 5 × 10−5 |
| Actor network | 12 × 64 × 64 × 2 |
| Critic network | 12 × 64 × 64 × 1 |
| Discount factor | 0.9, 0.99, 0.999 |
| Clip ratio | 0.1, 0.2, 0.3 |
| PPO epoch | 10 |
| Learning optimizer | Adam |
| Probability of failure of redundant executors | 0.1 |
| Information entropy decay rate during training phase | 0.01 |
| Information entropy decay rate in the application phase of algorithm simulation | 0.05 |
| Attacker’s base reward | −0.5 |
| Learning Rate | 1 × 10−3 | 5 × 10−3 | 1 × 10−4 | 5 × 10−5 |
|---|---|---|---|---|
| 2m = 6(AR) | 3.340 | 5.338 | 6.242 | 3.107 |
| 2m = 8(AR) | 6.668 | 7.169 | 7.875 | 5.791 |
| Clip Ratio | 0.1 | 0.2 | 0.3 |
|---|---|---|---|
| 2m = 6(AR) | 6.389 | 6.401 | 6.477 |
| 2m = 8(AR) | 6.092 | 6.603 | 6.607 |
| Discount Factor | 0.9 | 0.99 | 0.999 |
|---|---|---|---|
| 2m = 6(AR) | 6.362 | 6.502 | 6.383 |
| 2m = 8(AR) | 6.347 | 6.633 | 6.210 |
| β | 0 | 0.2 | 0.4 | 0.6 | 0.8 | 1 |
| 2m = 6(AR) | 4.415 | 4.882 | 6.056 | 3.487 | 1.490 | 0.601 |
| 2m = 8(AR) | 6.657 | 7.103 | 8.324 | 4.551 | 2.209 | 0.655 |
| PPO_EH | PPO_E | PPO | |
|---|---|---|---|
| 2m = 6(AR) | 6.245 | 6.530 | 2.872 |
| 2m = 8(AR) | 8.261 | 8.655 | 4.002 |
| Redundancy 2m = 6 | Redundancy 2m = 8 | |||
|---|---|---|---|---|
| Average Scheduling Cycle | Experimental Variance Scheduling | Average Scheduling Cycle | Experimental Variance Scheduling | |
| Random | 13.43 | 32.85 | 12.52 | 19.71 |
| REWS | 95.84 | 106.87 | 86.73 | 52.29 |
| PPO | 61.21 | 25.24 | 30.72 | 14.05 |
| PPO_E | 161.07 | 6.46 | 134.69 | 2.98 |
| PPO_EH | 121.65 | 7.29 | 103.73 | 8.85 |
| Redundancy 2m = 6 | Redundancy 2m = 8 | |||
|---|---|---|---|---|
| Information Entropy Decay Amount | Information Entropy Decay Rate | Information Entropy Decay Amount | Information Entropy Decay Rate | |
| Random | 12.50 | 0.93 | 12.10 | 0.97 |
| REWS | 41.22 | 0.43 | 45.06 | 0.52 |
| PPO | 43.21 | 0.71 | 24.27 | 0.79 |
| PPO_E | 35.91 | 0.22 | 29.64 | 0.22 |
| PPO_EH | 19.08 | 0.16 | 18.04 | 0.17 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, X.; Wang, M.; Chang, X.; Li, C.; Wang, Y.; Liang, B.; Deng, S. Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach. Entropy 2025, 27, 1238. https://doi.org/10.3390/e27121238
Wu X, Wang M, Chang X, Li C, Wang Y, Liang B, Deng S. Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach. Entropy. 2025; 27(12):1238. https://doi.org/10.3390/e27121238
Chicago/Turabian StyleWu, Xinghua, Mingzhe Wang, Xiaolin Chang, Chao Li, Yixiang Wang, Bo Liang, and Shengjiang Deng. 2025. "Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach" Entropy 27, no. 12: 1238. https://doi.org/10.3390/e27121238
APA StyleWu, X., Wang, M., Chang, X., Li, C., Wang, Y., Liang, B., & Deng, S. (2025). Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach. Entropy, 27(12), 1238. https://doi.org/10.3390/e27121238

