Model-Based Reinforcement Learning for Containing Malware Propagation in Wireless Radar Sensor Networks
Abstract
1. Introduction
- (1)
- Based on previous research [5], this paper constructs a fractional-order VCISQ propagation model to more accurately describe the spread of malware in WRSNs. At the same time, this paper derives a fractional-order optimal control method for the fractional-order VCISQ propagation model. Through fractional-order optimal control, the theoretical optimal control effect under treatment and quarantine measures is obtained. At the same time, the final control cost generated becomes the standard by which the control effect of RL methods is judged.
- (2)
- This study proposes a method called the Model-Based Soft Actor–Critic (MBSAC) MBRL algorithm, which is based on the general RL method SAC and has an additional prediction network that can use existing interaction data to predict future interaction environments. MBSAC can also use interaction and predictive data to optimize the agent network simultaneously, reducing the number of interactions required for RL methods and improving their learning efficiency. Compared to the current SOTA general RL methods, MBSAC has lower control costs, faster convergence, and more stable control processes.
2. Related Work
3. Fractional-Order Model and Optimal Control Method
3.1. Fractional-Order Malware Propagation Model
- (1)
- Susceptible node : is a sensor node with vulnerabilities but not yet infected with the malware. It may carry the malware and transform into the carrier node by interacting with the infected node . Alternatively, it may transform into the quarantined node due to quarantine measures.
- (2)
- Carrier node : After time, the malware is activated and the carrier node is transformed into the infected node .
- (3)
- Infected node : is a node that has activated and is spreading the malware. It will change to the secured node after time, due to treatment measures such as installing a patch.
- (4)
- Secured node : is a node that has been patched and restored, and is temporarily immune to malware. However, the patch effect may weaken after H time, and it may revert to a susceptible node .
- (5)
- Quarantined node : is a node that has been quarantined and is no longer participating in network interactions. It may revert to susceptible node after h time if the quarantine fails. Alternatively, it may be transformed into the secured node if treatment measures such as installing a patch are implemented.
3.2. Fractional Optimal Control Method
3.3. Algorithm Complexity Analysis
- Environment Interaction (): Requires per episode, where (single agent), T is episode length, and is per-step interaction time.
- Predictive Model Training (): Involves computations per episode for training iterations over replay buffer .
- Network Updates (): Demands per episode, where is the gradient step interval and denotes policy/critic update time.
4. Model-Based Soft Actor–Critic
4.1. Soft Actor–Critic
Algorithm 1 Model-Based Soft Actor–Critic | |
1: | Initialize the parameters of the policy network , the predictive model p, the critic networks and , |
the target critic networks and , the entropy regularization coefficient l | |
the real replay buffer and the predictive replay buffer | |
2: | for 1 to E do |
Return the initial state | |
3: | for toTdo |
4: | Select the action with agent’s policy network |
5: | Put into the VCISQ model (1), and return the next state and the reward |
6: | Store real trajectory into |
7: | |
8: | if |
9: | Return all trajectory data of , and output the vector and with the predictive model (28) |
10: | Train the predictive model parameter of p with and (29) |
11: | for 1 to do |
12: | Randomly sampling state from to become the model state |
13: | Output the model action with agent’s policy network |
14: | Output the predictive vector with and and |
calculate the next model state and the model reward (31) | |
15: | Store Predictive trajectory into |
16: | |
17: | end for |
18: | end for |
19: | for each gradient step do |
20: | Randomly sample and from and respectively |
21: | Combine the two datasets in the specified ratio to facilitate update. |
22: | Update all network parameters |
(32) | |
for (33) | |
for (34) | |
Update the entropy regularization coefficient parameter (35) | |
23: | end for |
24: | end for |
4.2. Predictive Model
4.3. Network Optimization
5. Experiment Validation
5.1. Experimental Setup
- (1)
- DQN [27]: A generalized reinforcement learning algorithm based on value functions.
- (2)
- SAC [4]: A generalized maximum entropy-based reinforcement learning algorithm that performs well on model species in continuous space.
- (3)
- PPO [5]: A generalized reinforcement learning algorithm based on importance sampling and trust domain.
- (4)
- MAPPO [5]: A multi-agent reinforcement learning algorithm based on PPO, in which agents can collaborate to optimize.
- (5)
- Fractional-order optimal control method: The control method proposed in this paper for the VCISQ model, which aims to output an optimal result.
- (6)
- MBSAC: A high sample efficiency MBRL algorithm proposed in this paper with higher convergence efficiency and learning performance.
5.1.1. Environment and Model Parameters (Table 2)
5.1.2. Algorithm Hyperparameters (Table 3)
5.2. Validation of Control Measure Validity
5.3. Impact on VCISQ Model with Different Fractional Orders q
5.4. Performance of Reinforcement Learning Under Different Hyperparameters
5.5. Relative Optimization at Different Fractional Orders Investigations
- (1)
- : The control cost of MBSAC is 7266.29, compared to 7143.22 for the optimal control, and has an error of .
- (2)
- : The cost of control for MBSAC is 7285.64, compared to 7143.22 for the optimal control, which has an error of .
- (3)
- : The control cost of MBSAC is 7377.86, which is an error of compared to the optimal control of 7143.22.
- (1)
- : The control cost of MBSAC is 7431.72, which is an error of about compared to the optimal control of 7265.18.
- (2)
- : The control cost of 7333.14 for MBSAC is about error compared to the optimal control 7265.18.
- (3)
- : The control cost of MBSAC is 7536.87, which is about error compared to the optimal control 7265.18.
- (1)
- : The control cost of MBSAC is 7623.66, which is about error compared to the optimal control 7415.57.
- (2)
- : The control cost of MBSAC is 7682.57, which is about error compared to the optimal control 7415.57.
- (3)
- : The control cost of MBSAC is 7538.14, which is about error compared to the optimal control 7415.57.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, S.; Gong, Y.; Li, X.; Li, Q. Integrated Sensing, Communication, and Computation Over the Air: Beampattern Design for Wireless Sensor Networks. IEEE Internet Things J. 2024, 11, 9681–9692. [Google Scholar] [CrossRef]
- Zhang, G.; Yi, W.; Matthaiou, M.; Varshney, P.K. Direct Target Localization With Low-Bit Quantization in Wireless Sensor Networks. IEEE Trans. Signal Process. 2024, 72, 3059–3075. [Google Scholar] [CrossRef]
- Dou, Z.; Yao, Z.; Zhang, Z.; Lu, M. A Lidar-Assisted Self-Localization Technology for Indoor Wireless Sensor Networks. IEEE Internet Things J. 2023, 10, 17515–17529. [Google Scholar] [CrossRef]
- Liu, G.; Li, H.; Xiong, L.; Tan, Z.; Liang, Z.; Zhong, X. Fractional-Order Optimal Control and FIOV-MASAC Reinforcement Learning for Combating Malware Spread in Internet of Vehicles. IEEE Trans. Autom. Sci. Eng. 2025, 22, 10313–10332. [Google Scholar] [CrossRef]
- Liu, G.; Li, H.; Xiong, L.; Chen, Y.; Wang, A.; Shen, D. Reinforcement Learning for Mitigating Malware Propagation in Wireless Radar Sensor Networks with Channel Modeling. Mathematics 2025, 13, 1397. [Google Scholar] [CrossRef]
- Shen, Y.; Shepherd, C.; Ahmed, C.M.; Shen, S.; Yu, S. Integrating Deep Spiking Q-network into Hypergame-theoretic Deceptive Defense for Mitigating Malware Propagation in Edge Intelligence-enabled IoT Systems. IEEE Trans. Serv. Comput. 2025, 18, 1487–1499. [Google Scholar] [CrossRef]
- Shen, S.; Cai, C.; Shen, Y.; Wu, X.; Ke, W.; Yu, S. Joint Mean-Field Game and Multiagent Asynchronous Advantage Actor-Critic for Edge Intelligence-Based IoT Malware Propagation Defense. IEEE Trans. Dependable Secur. Comput. 2025, 22, 3824–3838. [Google Scholar] [CrossRef]
- Ahn, H.; Choi, J.; Kim, Y.H. A Mathematical Modeling of Stuxnet-Style Autonomous Vehicle Malware. IEEE Trans. Intell. Transp. Syst. 2023, 24, 673–683. [Google Scholar] [CrossRef]
- Essouifi, M.; Lachgar, A.; Vasudevan, M.; B’ayir, C.; Achahbar, A.; Elkhamkhami, J. Automated Hubs-Patching: Protection Against Malware Spread Through Reduced Scale-Free Networks and External Storage Devices. IEEE Trans. Netw. Sci. Eng. 2024, 11, 4758–4773. [Google Scholar] [CrossRef]
- Liu, G.; Zhang, J.; Zhong, X.; Hu, X.; Liang, Z. Hybrid Optimal Control for Malware Propagation in UAV-WSN System: A Stacking Ensemble Learning Control Algorithm. IEEE Internet Things J. 2024, 11, 36549–36568. [Google Scholar] [CrossRef]
- Peng, B.; Liu, J.; Zeng, J. Dynamic Analysis of Multiplex Networks With Hybrid Maintenance Strategies. IEEE Trans. Inf. Forensics Secur. 2024, 19, 555–570. [Google Scholar] [CrossRef]
- Chen, J.; Sun, S.; Xia, C.; Shi, D.; Chen, G. Modeling and Analyzing Malware Propagation Over Wireless Networks Based on Hypergraphs. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3767–3778. [Google Scholar] [CrossRef]
- Li, H.; Liu, G.; Xiong, L.; Liang, Z.; Zhong, X. Meta-Reinforcement Learning for Controlling Malware Propagation in Internet of Underwater Things. IEEE Trans. Netw. Sci. Eng. 2025; early access. [Google Scholar] [CrossRef]
- Shen, S.; Xie, L.; Zhang, Y.; Wu, G.; Zhang, H.; Yu, S. Joint Differential Game and Double Deep Q-Networks for Suppressing Malware Spread in Industrial Internet of Things. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5302–5315. [Google Scholar] [CrossRef]
- Zheng, Y.; Na, Z.; Ji, W.; Lu, Y. An Adaptive Fuzzy SIR Model for Real-Time Malware Spread Prediction in Industrial Internet of Things Networks. IEEE Internet Things J. 2025, 12, 22875–22888. [Google Scholar] [CrossRef]
- Jafar, M.T.; Yang, L.-X.; Li, G.; Zhu, Q.; Gan, C. Minimizing Malware Propagation in Internet of Things Networks: An Optimal Control Using Feedback Loop Approach. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9682–9697. [Google Scholar] [CrossRef]
- Liu, G.; Tan, Z.; Liang, Z.; Chen, H.; Zhong, X. Fractional Optimal Control for Malware Propagation in Internet of Underwater Things. IEEE Internet Things J. 2024, 11, 11632–11651. [Google Scholar] [CrossRef]
- Heidari, A.; Jabraeil Jamali, M.A. Internet of Things intrusion detection systems: A comprehensive review and future directions. Clust. Comput. 2023, 26, 3753–3780. [Google Scholar] [CrossRef]
- Asadi, M.; Jabraeil Jamali, M.A.; Heidari, A.; Navimipour, N.J. Botnets Unveiled: A Comprehensive Survey on Evolving Threats and Defense Strategies. Trans. Emerg. Telecommun. Technol. 2024, 35, 1–39. [Google Scholar] [CrossRef]
- Ghimire, B.; Rawat, D.B. Recent Advances on Federated Learning for Cybersecurity and Cybersecurity for Federated Learning for Internet of Things. IEEE Internet Things J. 2022, 9, 8229–8249. [Google Scholar] [CrossRef]
- Hua, H.; Wang, Y.; Zhong, H.; Zhang, H.; Fang, Y. A Novel Guided Deep Reinforcement Learning Tracking Control Strategy for Multirotors. IEEE Trans. Autom. Sci. Eng. 2025, 22, 2062–2074. [Google Scholar] [CrossRef]
- Muduli, R.; Jena, D.; Moger, T. Application of Reinforcement Learning-Based Adaptive PID Controller for Automatic Generation Control of Multi-Area Power System. IEEE Trans. Autom. Sci. Eng. 2025, 22, 1057–1068. [Google Scholar] [CrossRef]
- Xu, T.; Pang, Y.; Zhu, Y.; Ji, W.; Jiang, R. Real-Time Driving Style Integration in Deep Reinforcement Learning for Traffic Signal Control. IEEE Trans. Intell. Transp. Syst. 2025, 26, 11879–11892. [Google Scholar] [CrossRef]
- Han, Z.; Chen, P.; Zhou, B.; Yu, G. Hybrid Path Tracking Control for Autonomous Trucks: Integrating Pure Pursuit and Deep Reinforcement Learning With Adaptive Look-Ahead Mechanism. IEEE Trans. Intell. Transp. Syst. 2025, 26, 7098–7112. [Google Scholar] [CrossRef]
- Zhan, D.; Liu, X.; Bai, W.; Li, W.; Guo, S.; Pan, Z. GAME-RL: Generating Adversarial Malware Examples against API Call Based Detection via Reinforcement Learning. IEEE Trans. Dependable Secur. Comput. 2025; early access. [Google Scholar] [CrossRef]
- Feng, C.; Celdrán, A.; Sánchez, P.; Kreischer, J.; Assen, J.; Bovet, G. CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation. IEEE TRansactions Dependable Secur. Comput. 2025, 22, 4398–4411. [Google Scholar] [CrossRef]
- Shen, Y.; Shepherd, C.; Ahmed, C.M.; Yu, S.; Li, T. Comparative DQN-Improved Algorithms for Stochastic Games-Based Automated Edge Intelligence-Enabled IoT Malware Spread-Suppression Strategies. IEEE Internet Things J. 2024, 11, 22550–22561. [Google Scholar] [CrossRef]
- Tannirkulam Chandrasekaran, S.; Kuruvila, A.P.; Basu, K.; Sanyal, A. Real-Time Hardware-Based Malware and Micro-Architectural Attack Detection Utilizing CMOS Reservoir Computing. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 349–353. [Google Scholar] [CrossRef]
- Saeednia, N.; Khayatian, A. Reset MPC-Based Control for Consensus of Multiagent Systems. IEEE Trans. Syst. Man Cybern. Syst. 2025, 55, 1611–1619. [Google Scholar] [CrossRef]
- Zuliani, R.; Balta, E.C.; Lygeros, J. BP-MPC: Optimizing the Closed-Loop Performance of MPC using BackPropagation. IEEE Trans. Autom. Control 2025, 70, 5690–5704. [Google Scholar] [CrossRef]
- Tang, H.; Chen, Y. Composite Observer based Resilient MPC for Heterogeneous UAV-UGV Systems Under Hybrid Cyber-Attacks. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 8277–8290. [Google Scholar] [CrossRef]
- Xu, J.-Z.; Liu, Z.-W.; Ge, M.-F.; Wang, Y.-W.; He, D.-X. Self-Triggered MPC for Teleoperation of Networked Mobile Robotic System via High-Order Estimation. IEEE Trans. Autom. Sci. Eng. 2025, 22, 6037–6049. [Google Scholar] [CrossRef]
- Wang, T.; Li, H.; Xia, C.; Zhang, H.; Zhang, P. From the Dialectical Perspective: Modeling and Exploiting of Hybrid Worm Propagation. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1610–1624. [Google Scholar] [CrossRef]
- Abusnaina, A.; Abuhamad, M.; Alasmary, H.; Anwar, A.; Jang, R.; Salem, S. DL-FHMC: Deep Learning-Based Fine-Grained Hierarchical Learning Approach for Robust Malware Classification. IEEE Trans. Dependable Secur. Comput. 2022, 19, 3432–3447. [Google Scholar] [CrossRef]
- Mei, Y.; Han, W.; Li, S.; Lin, K.; Tian, Z.; Li, S. A Novel Network Forensic Framework for Advanced Persistent Threat Attack Attribution Through Deep Learning. IEEE Trans. Intell. Transp. Syst. 2024, 25, 12131–12140. [Google Scholar] [CrossRef]
- Ahmed, I.; Anisetti, M.; Ahmad, A.; Jeon, G. A Multilayer Deep Learning Approach for Malware Classification in 5G-Enabled IIoT. IEEE Trans. Ind. Inform. 2023, 19, 1495–1503. [Google Scholar] [CrossRef]
- Safari, A.; Hassanzadeh Yaghini, H.; Kharrati, H.; Rahimi, A.; Oshnoei, A. Voltage Controller Design for Offshore Wind Turbines: A Machine Learning-Based Fractional-Order Model Predictive Method. Fractal Fract 2024, 8, 463. [Google Scholar] [CrossRef]
- Tian, B.; Jiang, J.; He, Z.; Yuan, X.; Dong, L.; Sun, C. Functionality-Verification Attack Framework Based on Reinforcement Learning Against Static Malware Detectors. IEEE Trans. Inf. Forensics Secur. 2024, 19, 8500–8514. [Google Scholar] [CrossRef]
- Abazari, A.; Soleymani, M.M.; Ghafouri, M.; Jafarigiv, D.; Atallah, R.; Assi, C. Deep Learning Detection and Robust MPC Mitigation for EV-Based Load-Altering Attacks on Wind-Integrated Power Grids. IEEE Trans. Ind. Cyber-Phys. Syst. 2024, 2, 244–263. [Google Scholar] [CrossRef]
- Amare, N.D.; Yang, S.J.; Son, Y.I. An Optimized Position Control via Reinforcement-Learning-Based Hybrid Structure Strategy. Actuators 2025, 14, 199. [Google Scholar] [CrossRef]
- Wang, Y.; Wei, M.; Dai, F.; Zou, D.; Lu, C.; Han, X.; Chen, Y.; Ji, C. Physics-Informed Fractional-Order Recurrent Neural Network for Fast Battery Degradation with Vehicle Charging Snippets. Fractal Fract 2025, 9, 91. [Google Scholar] [CrossRef]
- Jafarr, M.; Yang, L.; Li, G. An innovative practical roadmap for optimal control strategies in malware propagation through the integration of RL with MPC. Comput. Secur. 2025, 148, 104186. [Google Scholar] [CrossRef]
- Wu, L.; Braatz, R.D. A Direct Optimization Algorithm for Input-Constrained MPC. IEEE Trans. Autom. Control 2025, 70, 1366–1373. [Google Scholar] [CrossRef]
- Li, D.; Li, Q.; Ye, Y.; Xu, S. A Framework for Enhancing Deep Neural Networks Against Adversarial Malware. IEEE Trans. Netw. Sci. Eng. 2021, 8, 736–750. [Google Scholar] [CrossRef]
Advantage | Fractional-Order Modeling | Traditional Control Method or DL Control Method | RL Control Method | Utilizing MPC Thinking to Accelerate Policy Network Learning |
---|---|---|---|---|
Study | ||||
Essouifi et al. [9] | × | √ | × | × |
Peng et al. [11] | × | √ | × | × |
Zheng et al. [15] | × | √ | × | × |
Liu et al. [10] | × | × | √ | × |
Abusnaina et al. [34] | × | √ | × | × |
Mei et al. [35] | × | √ | × | × |
Ahmed et al. [36] | × | √ | × | × |
Tian et al. [38] | × | × | √ | × |
Liu et al. [4] | √ | × | √ | × |
Liu et al. [5] | × | × | √ | × |
Shen et al. [14] | × | √ | × | × |
Jafar et al. [16] | × | √ | × | × |
Zhan et al. [25] | × | × | √ | × |
Feng et al. [26] | × | × | √ | × |
Lin et al. (ours) | √ | × | √ | √ |
Parameters | Values | |
---|---|---|
Model Parameters | Initial number of susceptible nodes | 270 |
Initial number of carrier nodes | 0 | |
Initial number of infected nodes | 30 | |
Initial number of secured nodes | 0 | |
Initial number of QUARANTINED nodes | 0 | |
Cost weight of susceptible nodes | 300 | |
Cost weight of infected nodes | 900 | |
Cost weight of quarantined nodes | 300 | |
Cost weight of treatment measure | 12 | |
Cost weight of quarantine measure | 12 | |
Attack severity of the malware | 0.85 | |
Malware installation delay | 1 | |
Patch installation delay | 0.5 | |
Patch failure delay | 2 | |
Quarantine failure delay h | 1 | |
Failure rate of treatment measures | 0.05 | |
The rate of quarantine k | 0.5 | |
Successful installation rate of malware | 0.8 | |
The communication difference coefficient between ideal and reality | ||
Transmit power coefficient | ||
Sliding length of the target square area a | 500 m | |
The exponent of path loss | 4 | |
Intensity of noise | −60 dBm | |
Signal-to-noise threshold | 3 dB | |
The density of total nodes | 0.6 | |
Fractional order q | 0.9 |
Parameter | Description | Value | |
---|---|---|---|
RL Algorithms | The target entropy | ||
E | The total number of running rounds | 100 | |
T | The total interaction time | 30 | |
The learning rate of the policy network | |||
The learning rate of the critic and the target critic network | |||
The learning rate of the predictive model | |||
The learning rate of entropy | |||
The discount factor | |||
The value of soft update | |||
The loss weight of the predictive model | |||
The proportion of the validation set for predictive model training | |||
The number of predictive rounds after each truncation | 5 | ||
The maximum of logarithmic variance | |||
The minimum of logarithmic variance | |||
The size of the real replay buffer | 8000 | ||
The size of the predictive replay buffer | 2000 | ||
The number of samples taken by the agent during each training of the network | 128 | ||
The maximum number of iterations for training predictive model | 15 | ||
Real data utilization rate of network updates |
Parameter | Control | ||||||
---|---|---|---|---|---|---|---|
Optimal Control | DQN | SAC | PPO | MAPPO | MBSAC | ||
Validation of control effectiveness | Only treatment measure | 8634.91 | – | – | – | – | – |
Only quarantine measure | 10,347.14 | – | – | – | – | – | |
Hybrid control measure | 7143.22 | – | – | – | – | – | |
The influence of different fractional orders q on the VCISQ model | 7143.22 | – | – | – | – | – | |
7265.18 | – | – | – | – | – | ||
7415.57 | – | – | – | – | – | ||
Performance of RL under different hyperparameters | 7143.22 | 8845.16 | 8547.18 | 7965.44 | 7763.58 | 7352.41 | |
7143.22 | 8753.24 | 8125.71 | 7847.51 | 7632.24 | 7266.29 | ||
7143.22 | 8949.33 | 8219.52 | 8088.93 | 7607.16 | 7448.52 | ||
7143.22 | 8964.21 | 8487.31 | 8004.55 | 7721.63 | 7511.36 | ||
7143.22 | 8753.24 | 8125.71 | 7847.51 | 7574.18 | 7266.29 | ||
7143.22 | 8865.63 | 8192.64 | 7935.82 | 7513.55 | 7373.21 | ||
Batch size = 64 | 7143.22 | 8931.36 | 8419.69 | 7948.06 | 7832.29 | 7428.87 | |
Batch size = 128 | 7143.22 | 8753.24 | 8125.71 | 7847.51 | 7683.77 | 7266.29 | |
Batch size = 256 | 7143.22 | 8799.22 | 8349.90 | 8056.99 | 7686.73 | 7298.74 | |
7143.22 | 8931.37 | 8257.25 | 8189.18 | 7568.24 | 7374.52 | ||
7143.22 | 8753.24 | 8125.71 | 7847.51 | 7469.56 | 7266.29 | ||
7143.22 | 9023.86 | 8564.18 | 8031.89 | 7711.23 | 7435.96 | ||
Investigation of relative optimal under different fractional orders q | , | 7143.22 | – | – | – | – | 7266.29 |
, | 7143.22 | – | – | – | – | 7285.64 | |
, | 7143.22 | – | – | – | – | 7377.86 | |
, | 7265.18 | – | – | – | – | 7431.72 | |
, | 7265.18 | – | – | – | – | 7338.14 | |
, | 7265.18 | – | – | – | – | 7536.87 | |
, | 7415.57 | – | – | – | – | 7623.66 | |
, | 7415.57 | – | – | – | – | 7682.57 | |
, | 7415.57 | – | – | – | – | 7538.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, H.; Tian, C.; Chen, L.; Liao, D.; Wang, Y.; Hua, Y. Model-Based Reinforcement Learning for Containing Malware Propagation in Wireless Radar Sensor Networks. Actuators 2025, 14, 434. https://doi.org/10.3390/act14090434
Lin H, Tian C, Chen L, Liao D, Wang Y, Hua Y. Model-Based Reinforcement Learning for Containing Malware Propagation in Wireless Radar Sensor Networks. Actuators. 2025; 14(9):434. https://doi.org/10.3390/act14090434
Chicago/Turabian StyleLin, Haitao, Can Tian, Linman Chen, Daizhi Liao, Yunbo Wang, and Yubo Hua. 2025. "Model-Based Reinforcement Learning for Containing Malware Propagation in Wireless Radar Sensor Networks" Actuators 14, no. 9: 434. https://doi.org/10.3390/act14090434
APA StyleLin, H., Tian, C., Chen, L., Liao, D., Wang, Y., & Hua, Y. (2025). Model-Based Reinforcement Learning for Containing Malware Propagation in Wireless Radar Sensor Networks. Actuators, 14(9), 434. https://doi.org/10.3390/act14090434