Trust Region Policy Learning for Adaptive Drug Infusion with Communication Networks in Hypertensive Patients
Abstract
:1. Introduction
- (1)
- The formulation of a data-driven nonlinear backstepping controller (NBC) is intended to concurrently modulate the average arterial blood pressure (AABP) via vasoactive drugs, thereby facilitating a more precise and efficient regulation of the system in comparison to traditional controllers that rely on static, predetermined configurations.
- (2)
- The implementation of the MF-DRL methodology serves to augment the effectiveness of the NBC strategy through the adjustment of parameters. This technique enhances the controller’s performance and permits timely modifications, thereby ensuring the system’s capacity to adaptively respond to fluctuating conditions and continuously improve its operational capabilities.
- (3)
- The implementation of the communication time delay (CTD) approach allows for an in-depth analysis of how transfer delay signals impact the system’s dynamics. This framework facilitates a comprehensive evaluation of its effectiveness and efficiency in managing and mitigating these delays, which is crucial for real-world applications. As a result, it offers valuable insights into the system’s robustness and applicability across various latency scenarios.
- (4)
- The performance of the established system when faced with a variety of patients is analyzed, in addition to assessing the system’s robustness against variations in critical parameters and exposure to hybrid noise disturbances.
2. Problem Formulation and Methodology
2.1. AABP Dynamic Model
2.2. Mechanism of Nonlinear Backstepping Controller (NBC)
3. The Design of the NBC Based on Trust Region Policy Optimization DRL
3.1. The Fundamental of TRPO Algorithm
3.2. The Functionality of the TRPO Algorithm in Tuning NBC’s Coefficients
4. Simulation Verifications and Statistical Analysis
4.1. Scenario 1 (Performance of the TRPO-Based NBC Under Normal Configuration and with CTD)
4.2. Scenario 2 (Performance of the TRPO-Based NBC When Facing Parameters Changing (Different Patients) and Hybrid Noise)
4.3. Scenario 3 (Quantitative Analysis and Comparative Assessment)
5. Conclusions and Forward-Looking Perspectives
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Varon, J.; Marik, P.E. Perioperative hypertension management. Vasc. Health Risk Manag. 2008, 4, 615–627. [Google Scholar] [CrossRef] [PubMed]
- Haas, C.E.; LeBlanc, J.M. Acute postoperative hypertension: A review of therapeutic options. Am. J. Health-Syst. Pharm. 2004, 61, 1661–1673. [Google Scholar] [CrossRef] [PubMed]
- Aronow, W.S. Management of hypertension in patients undergoing surgery. Ann. Transl. Med. 2017, 5, 227. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Liang, J.; Hu, J.; Ma, L.; Yang, J.; Zhang, A.; On, B.O.T.C. Screening for primary aldosteronism on and off interfering medications. Endocrine 2024, 83, 178–187. [Google Scholar] [CrossRef] [PubMed]
- Bailey, J.M.; Haddad, W.M. Drug dosing control in clinical pharmacology. IEEE Control Syst. Mag. 2005, 25, 35–51. [Google Scholar]
- Wang, X.; Chen, X.; Tang, Y.; Wu, J.; Qin, D.; Yu, L.; Wu, A. The Therapeutic Potential of Plant Polysaccharides in Metabolic Diseases. Pharmaceuticals 2022, 15, 1329. [Google Scholar] [CrossRef] [PubMed]
- Miller, C.P.; Cook, A.M.; Pharm, D.; Case, C.D.; Bernard, A.C. As-needed antihypertensive therapy in surgical patients, why and how: Challenging a paradigm. Am. Surg. 2012, 78, 250–253. [Google Scholar] [CrossRef] [PubMed]
- Dodson, G.M.; Bentley, W.E., IV; Awad, A.; Muntazar, M.; Goldberg, M.E. Isolated perioperative hypertension: Clinical implications & contemporary treatment strategies. Curr. Hypertens. Rev. 2014, 10, 31–36. [Google Scholar]
- Li, H.; Wang, Y.; Fan, R.; Lv, H.; Sun, H.; Xie, H.; Xia, Z. The effects of ferulic acid on the pharmacokinetics of warfarin in rats after biliary drainage. Drug Des. Dev. Ther. 2016, 10, 2173–2180. [Google Scholar] [CrossRef]
- Slate, J.; Sheppard, L.; Rideout, V.; Blackstone, E. A model for design of a blood pressure controller for hypertensive patients. IFAC Proc. Vol. 1979, 12, 867–874. [Google Scholar] [CrossRef]
- Sharma, R.; Deepak, K.; Gaur, P.; Joshi, D. An optimal interval type-2 fuzzy logic control based closed-loop drug administration to regulate the mean arterial blood pressure. Comput. Methods Programs Biomed. 2020, 185, 105167. [Google Scholar] [CrossRef] [PubMed]
- Sharma, R.; Kumar, A. Optimal Interval type-2 fuzzy logic control based closed-loop regulation of mean arterial blood pressure using the controlled drug administration. IEEE Sens. J. 2022, 22, 7195–7207. [Google Scholar] [CrossRef]
- Kumar, A.; Raj, R. Design of a fractional order two layer fuzzy logic controller for drug delivery to regulate blood pressure. Biomed. Signal Process. Control 2022, 78, 104024. [Google Scholar] [CrossRef]
- Ahmadpour, M.R.; Ghadiri, H.; Hajian, S.R. Model predictive control optimisation using the metaheuristic optimisation for blood pressure control. IET Syst. Biol. 2021, 15, 41–52. [Google Scholar] [CrossRef] [PubMed]
- Padmanabhan, R.; Meskin, N.; Haddad, W.M. Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomed. Signal Process. Control 2015, 22, 54–64. [Google Scholar] [CrossRef]
- Mai, V.; Alattas, K.A.; Bouteraa, Y.; Ghaderpour, E.; Mohammadzadeh, A. Personalized Blood Pressure Control by Machine Learning for Remote Patient Monitoring. IEEE Access 2024, 12, 83994–84004. [Google Scholar] [CrossRef]
- da Silva, S.J.; Scardovelli, T.A.; da Silva Boschi, S.R.M.; Rodrigues, S.C.M.; da Silva, A.P. Simple adaptive PI controller development and evaluation for mean arterial pressure regulation. Res. Biomed. Eng. 2019, 35, 157–165. [Google Scholar] [CrossRef]
- Malagutti, N.; Dehghani, A.; Kennedy, R.A. Robust control design for automatic regulation of blood pressure. IET Control Theory Appl. 2013, 7, 387–396. [Google Scholar] [CrossRef]
- Faraji, B.; Gheisarnejad, M.; Esfahani, Z.; Khooban, M.-H. Smart sensor control for rehabilitation in Parkinson’s patients. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 6, 267–275. [Google Scholar] [CrossRef]
- Berardehi, Z.R.; Zhang, C.; Taheri, M.; Roohi, M.; Khooban, M.H. A Fuzzy Control Strategy to Synchronize Fractional-Order Nonlinear Systems Including Input Saturation. Int. J. Intell. Syst. 2023, 2023, 1550256. [Google Scholar] [CrossRef]
- Faraji, B.; Gheisarnejad, M.; Rouhollahi, K.; Esfahani, Z.; Khooban, M.H. Machine learning approach based on ultra-local model control for treating cancer pain. IEEE Sens. J. 2020, 21, 8245–8252. [Google Scholar] [CrossRef]
- Malik, K.; Tayal, A. Comparison of nature inspired metaheuristic algorithms. Int. J. Electron. Electr. Eng. 2014, 7, 799–802. [Google Scholar]
- Vaidyanathan, S.; Azar, A.T. An introduction to backstepping control. In Backstepping Control of Nonlinear Dynamical Systems; Elsevier: Amsterdam, The Netherlands, 2021; pp. 1–32. [Google Scholar]
- Bing, P.; Liu, W.; Zhai, Z.; Li, J.; Guo, Z.; Xiang, Y.; Zhu, L. A novel approach for denoising electrocardiogram signals to detect cardiovascular diseases using an efficient hybrid scheme. Front. Cardiovasc. Med. 2024, 11, 1277123. [Google Scholar] [CrossRef]
- Peng, C.C.; Li, Y.; Chen, C.L. A robust integral type backstepping controller design for control of uncertain nonlinear systems subject to disturbance. Int. J. Innov. Comput. Inf. Control 2011, 7, 2543–2560. [Google Scholar]
- Bu, X.; Wang, Q.; Hou, Z.; Qian, W. Data driven control for a class of nonlinear systems with output saturation. ISA Trans. 2018, 81, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Shani, L.; Efroni, Y.; Mannor, S. Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5668–5675. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Touati, A.; Zhang, A.; Pineau, J.; Vincent, P. Stable policy optimization via off-policy divergence regulari-zation. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual, 3–6 August 2020; pp. 1328–1337. [Google Scholar]
- Faraji, B.; Rouhollahi, K.; Paghaleh, S.M.; Gheisarnejad, M.; Khooban, M.-H. Adaptive multi symptoms control of Parkinson’s disease by deep reinforcement learning. Biomed. Signal Process. Control 2023, 80, 104410. [Google Scholar] [CrossRef]
- Su, Y.; Tian, X.; Gao, R.; Guo, W.; Chen, C.; Chen, C.; Lv, X. Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis. Comput. Biol. Med. 2022, 145, 105409. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Li, Z.; Pan, X.; Ma, Z.; Dai, Y.; Mohammadzadeh, A.; Zhang, C. Adaptive Drug Delivery to Control Mean Arterial Blood Pressure by Reinforcement Fuzzy Q-learning. IEEE Sens. J. 2024, 24, 30968–30977. [Google Scholar] [CrossRef]
- Ahmed, S.; Özbay, H. Design of a switched robust control scheme for drug delivery in blood pressure regulation. IFAC-Pap. 2016, 49, 252–257. [Google Scholar] [CrossRef]
- Parvaresh, A.; Abrazeh, S.; Mohseni, S.-R.; Zeitouni, M.J.; Gheisarnejad, M.; Khooban, M.-H. A novel deep learning backstepping controller-based digital twins technology for pitch angle control of variable speed wind turbine. Designs 2020, 4, 15. [Google Scholar] [CrossRef]
- Berardehi, Z.R.; Zhang, C.; Taheri, M.; Roohi, M.; Khooban, M.H. Implementation of TS fuzzy approach for the synchronization and stabilization of non-integer-order complex systems with input saturation at a guaranteed cost. Trans. Inst. Meas. Control 2023, 45, 2536–2553. [Google Scholar] [CrossRef]
- Al Younes, Y.; Drak, A.; Noura, H.; Rabhi, A.; El Hajjaji, A. Nonlinear integral backstepping—Model-free control applied to a quadrotor system. Proc. Int. Conf. Intell. Unmanned Syst 2014, 10, 1–6. [Google Scholar]
- Lele, S.; Gangar, K.; Daftary, H.; Dharkar, D. Stock Market Trading Agent Using On-Policy Reinforcement Learning Algorithms. 2020. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3582014 (accessed on 16 December 2024).
- Zhang, Y.; Ross, K.W. On-policy deep reinforcement learning for the average-reward criterion. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 12535–12545. [Google Scholar]
- Li, S.E. Deep reinforcement learning. In Reinforcement Learning for Sequential Decision and Optimal Control; Springer: Berlin/Heidelberg, Germany, 2023; pp. 365–402. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Roijers, D.M.; Steckelmacher, D.; Nowé, A. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, Stockholm, Sweden, 14–15 July 2018; Volume 2018. [Google Scholar]
- Sun, M.; Ellis, B.; Mahajan, A.; Devlin, S.; Hofmann, K.; Whiteson, S. Trust-region-free policy optimization for stochastic policies. arXiv 2023, arXiv:2302.07985. [Google Scholar]
- Meng, W.; Zheng, Q.; Shi, Y.; Pan, G. An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2223–2235. [Google Scholar] [CrossRef] [PubMed]
- Frei, C.W.; Derighetti, M.; Morari, M.; Glattfelder, A.H.; Zbinden, A.M. Improving regulation of mean arterial blood pressure during anesthesia through estimates of surgery effects. IEEE Trans. Biomed. Eng. 2000, 47, 1456–1464. [Google Scholar] [CrossRef]
- Pachauri, N.; Ghousiya Begum, K. Automatic drug infusion control based on metaheuristic H2 optimal theory for regulating the mean arterial blood pressure. Asia-Pac. J. Chem. Eng. 2021, 16, e2654. [Google Scholar] [CrossRef]
- Karar, M.E.; Mahmoud, A.S.A. Intelligent Networked Control of Vasoactive Drug Infusion for Patients with Uncertain Sensitivity. Comput. Syst. Sci. Eng. 2023, 47, 721–739. [Google Scholar] [CrossRef]
Variables | Quantities | Variables | Quantities |
---|---|---|---|
Systematic circulation | Drug fraction recirculated | ||
Drug action | Delay | ||
Pulmonary Circulation | Sensitivity | ||
TRPO Hyperparameters | |||
Number of hidden layers | 2 | Activation function | ReLU |
Learning rate | Discount factor | ||
KL divergence constraint |
Methods | Agents | Normal | Fixed CTD | Random CTD | ||||||
---|---|---|---|---|---|---|---|---|---|---|
130 | 150 | 170 | 130 | 150 | 170 | 130 | 150 | 170 | ||
Controller 1 | AABP | 6.16 | 7.41 | 8.01 | 6.34 | 7.58 | 8.31 | 6.78 | 7.78 | 8.2 |
Drug | 2.02 | 2.16 | 2.18 | 2.05 | 2.12 | 2.2 | 2.09 | 2.17 | 2.22 | |
Error | 2.75 | 3.23 | 4.15 | 3.06 | 3.72 | 4.68 | 3.54 | 3.85 | 4.98 | |
Controller 2 | AABP | 6.48 | 7.56 | 8.38 | 6.75 | 7.76 | 8.72 | 6.97 | 7.93 | 8.59 |
Drug | 2.12 | 2.53 | 2.85 | 2.74 | 2.96 | 3.15 | 3.08 | 3.15 | 3.61 | |
Error | 3.21 | 3.95 | 4.76 | 3.74 | 4.18 | 5.31 | 4.01 | 4.53 | 5.1 | |
Controller 3 | AABP | 6.89 | 7.83 | 8.98 | 7.12 | 8.36 | 9.24 | 7.58 | 8.65 | 8.93 |
Drug | 2.28 | 2.56 | 3.02 | 2.96 | 3.52 | 3.95 | 3.54 | 4.05 | 4.58 | |
Error | 3.72 | 4.41 | 5.19 | 4.23 | 4.73 | 5.85 | 4.65 | 5.27 | 5.81 |
Control Strategy | Setling Time | Over Shoot | IAE | |
---|---|---|---|---|
IT2-FLC-PID-based Cuckoo Search algorithm [8] | 350 | 0 | 10,450 | |
Interval Type-2 Fuzzy Logic-based Grey Wolf Optimization (GWO) [9] | 277 | 0 | 8327 | |
Fractional Order two-layer Fuzzy Logic-based GWO [10] | 274 | 0 | 8331 | |
ADRC-based CPAG [13] | 268 | 0 | 8307 | |
2 degrees of freedom (DOF)-PI-based dragonfly optimization (DA) [45] | 495 | 0.0951 | 13,124 | |
Type-1 Fuzzy (IT2F) Logic and teaching–learning-based optimization (TLBO) algorithm [46] | 319 | 0.63 | 9560 | |
Current strategy | Normal | 262 | 0 | 8135 |
Fixed CTD | 271 | 0 | 8319 | |
Random CTD | 282 | 0 | 8346 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vu, M.T.; Kim, S.H.; Thanh, H.L.N.N.; Roohi, M.; Nguyen, T.H. Trust Region Policy Learning for Adaptive Drug Infusion with Communication Networks in Hypertensive Patients. Mathematics 2025, 13, 136. https://doi.org/10.3390/math13010136
Vu MT, Kim SH, Thanh HLNN, Roohi M, Nguyen TH. Trust Region Policy Learning for Adaptive Drug Infusion with Communication Networks in Hypertensive Patients. Mathematics. 2025; 13(1):136. https://doi.org/10.3390/math13010136
Chicago/Turabian StyleVu, Mai The, Seong Han Kim, Ha Le Nhu Ngoc Thanh, Majid Roohi, and Tuan Hai Nguyen. 2025. "Trust Region Policy Learning for Adaptive Drug Infusion with Communication Networks in Hypertensive Patients" Mathematics 13, no. 1: 136. https://doi.org/10.3390/math13010136
APA StyleVu, M. T., Kim, S. H., Thanh, H. L. N. N., Roohi, M., & Nguyen, T. H. (2025). Trust Region Policy Learning for Adaptive Drug Infusion with Communication Networks in Hypertensive Patients. Mathematics, 13(1), 136. https://doi.org/10.3390/math13010136