Hierarchical Reinforcement Learning–Based Optimal Control for Model-Free Linear Systems
Abstract
1. Introduction
- 1
- By integrating reinforcement learning with the LQR control algorithm, the proposed framework effectively addresses the challenge of controlling systems with unknown parameters, eliminating the need for precise knowledge of the system model.
- 2
- Trajectory entropy is utilized to optimize the controller’s performance parameters, thereby reducing reliance on manual tuning and expanding the applicability of the control system.
- 3
- Through a hierarchical reinforcement learning architecture, adaptive optimization of the control performance weight matrices is achieved during the control process, enabling the system to adapt to dynamic changes and select improved control trajectories.
2. Problem Description
2.1. Traditional LQR Control Algorithm
2.2. Data-Driven Reinforcement Learning for LQR Problems
2.2.1. Strategic Evaluation
2.2.2. Strategy Update
| Algorithm 1 Model-free tracking control |
|
3. Architecture Design of Hierarchical Reinforcement Learning
3.1. Weight Matrix Adaptive Update Strategy
3.2. Hierarchical Reinforcement Learning–Based Controller Framework
| Algorithm 2 Hierarchical reinforcement learning-based model-free linear system optimal control (HRL-LQR) |
|
4. Simulation Result
4.1. Numerical Simulations of Second- and Third-Order Systems
4.2. Comparison with Traditional Intelligent Algorithms
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Reid, A.; MacDonald, C.; James, A.; Davidson, A. Advanced Control Systems in Smart Manufacturing: IoT and AI Integration. 2025. Available online: https://www.researchgate.net/publication/391392598_Advanced_Control_Systems_in_Smart_Manufacturing_IoT_and_AI_Integration (accessed on 11 February 2026).
- Daoutidis, P.; Megan, L.; Tang, W. The future of control of process systems. Comput. Chem. Eng. 2023, 178, 108365. [Google Scholar] [CrossRef]
- Chacko, S.J.; Neeraj, P.C.; Abraham, R.J. Optimizing LQR controllers: A comparative study. Results Control Optim. 2024, 14, 100387. [Google Scholar] [CrossRef]
- Akbari, B.; Frank, J.; Greeff, M. Tiny learning-based MPC for multirotors: Solver-aware learning for efficient embedded predictive control. Mechatronics 2026, 115, 103452. [Google Scholar] [CrossRef]
- Herzallah, R. A fully probabilistic design for stochastic systems with input delay. Int. J. Control 2021, 94, 2934–2944. [Google Scholar] [CrossRef]
- Sykora, H.; Sadeghpour, M.; Ge, J. On the moment dynamics of stochastically delayed linear control systems. Int. J. Robust Nonlinear Control 2020, 30, 8074–8097. [Google Scholar] [CrossRef]
- Xie, K.; Zheng, Y.; Jiang, Y.; Lan, W.; Yu, X. Optimal dynamic output feedback control of unknown linear continuous-time systems by adaptive dynamic programming. Automatica 2024, 163, 11160. [Google Scholar] [CrossRef]
- Wei, Q.; Liu, D.; Xu, Y. Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach. Soft Comput. 2016, 20, 697–706. [Google Scholar] [CrossRef]
- Abouheaf, M.; Gueaieb, W.; Lewis, F. Online model-free reinforcement learning for the automatic control of a flexible wing aircraft. IET Control Theory Appl. 2020, 14, 73–84. [Google Scholar] [CrossRef]
- Liu, W.; Fan, J.L.; Xue, W.Q. Linear Quadratic Optimal Control Method Based on Output Feedback Inverse Reinforcement Q-Learning. Control Theory Appl. 2024, 41, 1469–1479. [Google Scholar]
- Lewis, F.L.; Vrabie, D.; Syemos, V. Optimal Control, 3rd ed.; Wiley: New York, NY, USA, 2012. [Google Scholar]
- Choi, S.; Kim, S.; Jin, K.H. Inverse reinforcement learning control for trajectory tracking of a multirotor UAV. Int. J. Control. Autom. Syst. 2017, 15, 1826–1834. [Google Scholar] [CrossRef]
- Ho, J.; Ermon, S. Generative adversarial imitation learning. In Conference on Neural Information Processing Systems; Curran Associates: Red Hook, NY, USA, 2016; pp. 4565–4573. [Google Scholar]
- Shao, Z.; Joo, E.M. A review of inverse reinforcement learning theory and recent advances. In IEEE Congress on Evolutionary Computation; IEEE: New York, NY, USA, 2012; pp. 1–8. [Google Scholar]
- Aza, N.A.; Shahmansoorian, A.; Davoudi, M. From inverse optimal control to inverse reinforcement learning: A historical review. Annu. Rev. Control 2020, 50, 119–138. [Google Scholar]
- Self, R.; Abudia, M.; Kamalapurkar, R. Online inverse reinforcement learning for systems with disturbances. In American Control Conference (ACC); IEEE: New York, NY, USA, 2020; pp. 1118–1123. [Google Scholar]
- Ng, A.Y.; Russell, S. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning; Morgan Kaufmann Publishers: San Francisco, CA, USA, 2000; pp. 663–670. [Google Scholar]
- Molloy, T.L.; Ford, J.J.; Perez, T. Online inverse optimal control on infinite horizons. In IEEE Conference on Decision and Control (CDC); IEEE: New York, NY, USA, 2018; pp. 1663–1668. [Google Scholar]
- Lu, M.-K.; Ge, M.-F.; Yan, Z.-C.; Ding, T.-F.; Liu, Z.-W.; Herzallah, R. An integrated decision-execution framework of cooperative control for multi-agent systems via reinforcement learning. Syst. Control Lett. 2024, 193, 105949. [Google Scholar] [CrossRef]
- Liu, D.; Yang, G.-H. Performance-based data-driven model-free adaptive sliding mode control for a class of discrete-time nonlinear processes. J. Process Control 2018, 68, 86–194. [Google Scholar] [CrossRef]
- Badfar, E.; Tavassoli, B. Model-free optimal control for discrete-time Markovian Jump Linear Systems: A Q-learning approach. J. Franklin Inst. 2025, 362, 107784. [Google Scholar] [CrossRef]
- Shulman, J.; Malatino, F.; Gunaratne, G.H. Model-free network control. Phys. D Nonlinear Phenom. 2020, 408, 132467. [Google Scholar] [CrossRef]
- Peng, Y.; Chen, Q.; Sun, W. Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 4109–4122. [Google Scholar] [CrossRef]
- Yuvapriya, T.; Lakshmi, P.; Elumalai, V.K. Experimental Validation of LQR Weight Optimization Using Bat Algorithm Applied to Vibration Control of Vehicle Suspension System. IETE J. Res. 2023, 69, 8142–8152. [Google Scholar] [CrossRef]
- Sun, Z.; Wen, Z.; Xu, L.; Gong, G.; Xie, X.; Sun, Z. LQR Control Method based on Improved Antlion Algorithm. In Proceedings of the 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 663–668. [Google Scholar]
- Fan, X.; Wang, J.; Wang, H.; Yang, L.; Xia, C. LQR Trajectory Tracking Control of Unmanned Wheeled Tractor Based on Improved Quantum Genetic Algorithm. Machines 2023, 11, 62. [Google Scholar] [CrossRef]
- Ayadi, W.; Alkhazraji, E.; Khaled, H.; Bouteraa, Y.; Abedini, M.; Mohammadzadeh, A. Adaptive heartbeat regulation using double deep reinforcement learning in a Markov decision process framework. Sci. Rep. 2025, 15, 35347. [Google Scholar] [CrossRef]
- Li, Z.; Shan, Y.; Mo, H. TBC-HRL: A Bio-Inspired Framework for Stable and Interpretable Hierarchical Reinforcement Learning. Biomimetics 2025, 10, 715. [Google Scholar] [CrossRef]
- Li, Y.; Luo, Z.; Zhang, T.; Dai, C.; Kanervisto, A.; Tirinzoni, A.; Weng, H.; Kitani, K.; Guzek, M.; Touati, A.; et al. Bfm-zero: A promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning. arXiv 2025, arXiv:2511.04131. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018; pp. 26–43. [Google Scholar]
- Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A. Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 2009, 19, 1574–1609. [Google Scholar] [CrossRef]
- Lai, J.; Wei, J.Y.; Chen, X.L. A survey of hierarchical reinforcement learning. Comput. Eng. Appl. 2021, 57, 72–79. [Google Scholar]













Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, Y.; Yan, X.; Yang, W.; Zhou, Y. Hierarchical Reinforcement Learning–Based Optimal Control for Model-Free Linear Systems. Mathematics 2026, 14, 895. https://doi.org/10.3390/math14050895
Zhang Y, Yan X, Yang W, Zhou Y. Hierarchical Reinforcement Learning–Based Optimal Control for Model-Free Linear Systems. Mathematics. 2026; 14(5):895. https://doi.org/10.3390/math14050895
Chicago/Turabian StyleZhang, Yong, Xiangrui Yan, Weiqing Yang, and Yuyang Zhou. 2026. "Hierarchical Reinforcement Learning–Based Optimal Control for Model-Free Linear Systems" Mathematics 14, no. 5: 895. https://doi.org/10.3390/math14050895
APA StyleZhang, Y., Yan, X., Yang, W., & Zhou, Y. (2026). Hierarchical Reinforcement Learning–Based Optimal Control for Model-Free Linear Systems. Mathematics, 14(5), 895. https://doi.org/10.3390/math14050895

