Reinforcement Learning-Based Tracking Control under Stochastic Noise and Unmeasurable State for Tip–Tilt Mirror Systems
Abstract
:1. Introduction
- A RL-based noisy-OPFB Q-learning scheme is proposed, which achieves a rapid and effective solution for the near optimal tracking controller according to state estimation theory, transversal prediction, and experience replay techniques.
- In contrast to the works in [25,27,28], the additive noise independent of state and inputs are taken into account, and the measurable assumption of full state or noise information is removed; only measured input–output data are used during the entire learning process. Also, an additional state observation step is not required, so it is more convenient to implement than the methods in [29].
- The effectiveness of the algorithm is demonstrated through an application example in a TTM tracking control system. Meanwhile, the application prospects of the method in the control of intelligent optical systems have been illustrated by a comparison with the traditional integral controller.
2. Problem Formulation
Algorithm 1: VI algorithm |
|
3. Principle of Noisy-OPFB Q-Learning Scheme
3.1. State Reconstruction Using Data History
3.2. Online Output Feedback Q-Learning Scheme
Algorithm 2: Improved noisy-OPFB Q-learning algorithm |
|
3.3. Algorithm Implementation
Algorithm 3: Implementation of the noisy-OPFB Q-learning algorithm |
|
4. Results and Discussion
4.1. Application Example of Tracking Control for a Tip–Tilt Mirror System
4.2. Convergency Analysis
4.3. Tracking Control Performance Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Arikawa, M.; Ito, T. Performance of Mode Diversity Reception of a Polarization-Division-Multiplexed Signal for Free-Space Optical Communication under Atmospheric Turbulence. Opt. Express 2018, 26, 28263. [Google Scholar] [CrossRef] [PubMed]
- Clénet, Y.; Kasper, M.; Ageorges, N.; Lidman, C.; Fusco, T.; Marco, O.P.; Hartung, M.; Mouillet, D.; Koehler, B.; Rousset, G. NAOS Performances: Impact of the Telescope Vibrations and Possible Origins. In Proceedings of the SF2A-2004: Semaine de l’Astrophysique Francaise, Paris, France, 14–18 June 2004; p. 179. [Google Scholar]
- Maly, J.R.; Erickson, D.; Pargett, T.J. Vibration Suppression for the Gemini Planet Imager. In Proceedings of the Ground-Based and Airborne Telescopes III, San Diego, CA, USA, 27 June–2 July 2010; SPIE: Bellingham, WA, USA, 2010; Volume 7733, pp. 506–514. [Google Scholar]
- Nousiainen, J.; Rajani, C.; Kasper, M.; Helin, T.; Haffert, S.Y.; Vérinaud, C.; Males, J.R.; Van Gorkom, K.; Close, L.M.; Long, J.D.; et al. Towards On-Sky Adaptive Optics Control Using Reinforcement Learning. A&A 2022, 664, A71. [Google Scholar] [CrossRef]
- Pou, B.; Ferreira, F.; Quinones, E.; Gratadour, D.; Martin, M. Adaptive Optics Control with Multi-Agent Model-Free Reinforcement Learning. Opt. Express 2022, 30, 2991. [Google Scholar] [CrossRef] [PubMed]
- Ke, H.; Xu, B.; Xu, Z.; Wen, L.; Yang, P.; Wang, S.; Dong, L. Self-Learning Control for Wavefront Sensorless Adaptive Optics System through Deep Reinforcement Learning. Optik 2019, 178, 785–793. [Google Scholar] [CrossRef]
- Landman, R.; Haffert, S.Y.; Radhakrishnan, V.M.; Keller, C.U. Self-Optimizing Adaptive Optics Control with Reinforcement Learning for High-Contrast Imaging. J. Astron. Telesc. Instrum. Syst. 2021, 7, 039002. [Google Scholar] [CrossRef]
- Werbos, P. Approximate Dynamic Programming for Real-Time Control and Neural Modeling. In Handbook of Intelligent Control; Van Nostrand Reinhold: New York, NY, USA, 1992. [Google Scholar]
- Qasem, O.; Gao, W. Robust Policy Iteration of Uncertain Interconnected Systems with Imperfect Data. IEEE Trans. Automat. Sci. Eng. 2023, 21, 1214–1222. [Google Scholar] [CrossRef]
- Song, R.; Lewis, F.L. Robust Optimal Control for a Class of Nonlinear Systems with Unknown Disturbances Based on Disturbance Observer and Policy Iteration. Neurocomputing 2020, 390, 185–195. [Google Scholar] [CrossRef]
- Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Model-Free Q-Learning Designs for Linear Discrete-Time Zero-Sum Games with Application to H-Infinity Control. Automatica 2007, 43, 473–481. [Google Scholar] [CrossRef]
- Li, Z.; Wang, M.; Ma, G. Adaptive Optimal Trajectory Tracking Control of AUVs Based on Reinforcement Learning. ISA Trans. 2023, 137, 122–132. [Google Scholar] [CrossRef]
- Wang, N.; Gao, Y.; Zhang, X. Data-Driven Performance-Prescribed Reinforcement Learning Control of an Unmanned Surface Vehicle. IEEE Trans. Neural Netw. Learning Syst. 2021, 32, 5456–5467. [Google Scholar] [CrossRef]
- Rizvi, S.A.A.; Pertzborn, A.J.; Lin, Z. Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 7523–7533. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Ma, D.; Li, M.-J.; Sun, Q.; Zhang, H.; Wang, P. Accurate Current Sharing and Voltage Regulation in Hybrid Wind/Solar Systems: An Adaptive Dynamic Programming Approach. IEEE Trans. Consumer Electron. 2022, 68, 261–272. [Google Scholar] [CrossRef]
- Yang, J.; Wang, Y.; Wang, T.; Yu, X. Optimal Tracking Control For A Two-Link Robotic Manipulator Via Adaptive Dynamic Programming. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6 November 2020; pp. 2687–2692. [Google Scholar]
- Fang, H.; Zhu, Y.; Dian, S.; Xiang, G.; Guo, R.; Li, S. Robust Tracking Control for Magnetic Wheeled Mobile Robots Using Adaptive Dynamic Programming. ISA Trans. 2022, 128, 123–132. [Google Scholar] [CrossRef] [PubMed]
- Littman, M.L. A Tutorial on Partially Observable Markov Decision Processes. J. Math. Psychol. 2009, 53, 119–125. [Google Scholar] [CrossRef]
- Liu, D.; Huang, Y.; Wang, D.; Wei, Q. Neural-Network-Observer-Based Optimal Control for Unknown Nonlinear Systems Using Adaptive Dynamic Programming. Int. J. Control. 2013, 86, 1554–1566. [Google Scholar] [CrossRef]
- Mu, C.; Wang, D.; He, H. Novel Iterative Neural Dynamic Programming for Data-Based Approximate Optimal Control Design. Automatica 2017, 81, 240–252. [Google Scholar] [CrossRef]
- Zhang, H.; Cui, L.; Zhang, X.; Luo, Y. Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method. IEEE Trans. Neural Netw. 2011, 22, 2226–2236. [Google Scholar] [CrossRef] [PubMed]
- Rizvi, S.A.A.; Lin, Z. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1523–1536. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, H.; Yu, R.; Qu, Q. Data-Driven Optimal Tracking Control for Discrete-Time Systems with Delays Using Adaptive Dynamic Programming. J. Frankl. Inst. 2018, 355, 5649–5666. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L.; Naghibi-Sistani, M.-B.; Karimpour, A. Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data. IEEE Trans. Cybern. 2015, 45, 2770–2779. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, H.; Luo, Y. Stochastic Linear Quadratic Optimal Control for Model-Free Discrete-Time Systems Based on Q-Learning Algorithm. Neurocomputing 2018, 312, 1–8. [Google Scholar] [CrossRef]
- Pang, B.; Jiang, Z.-P. Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation. AAAI 2021, 35, 9303–9311. [Google Scholar] [CrossRef]
- Tao, B.; Zhong-Ping, J. Adaptive Optimal Control for Linear Stochastic Systems with Additive Noise. In Proceedings of the 2015 34th Chinese Control Conference (CCC), Hangzhou, China, 28–30 July 2015; pp. 3011–3016. [Google Scholar]
- Lai, J.; Xiong, J.; Shu, Z. Model-Free Optimal Control of Discrete-Time Systems with Additive and Multiplicative Noises. Automatica 2023, 147, 110685. [Google Scholar] [CrossRef]
- Zhang, M.; Gan, M.-G.; Chen, J. Data-Driven Adaptive Optimal Control for Stochastic Systems with Unmeasurable State. Neurocomputing 2020, 397, 1–10. [Google Scholar] [CrossRef]
- Kiumarsi, B.; Lewis, F.L.; Modares, H.; Karimpour, A.; Naghibi-Sistani, M.-B. Reinforcement-Learning for Optimal Tracking Control of Linear Discrete-Time Systems with Unknown Dynamics. Automatica 2014, 50, 1167–1175. [Google Scholar] [CrossRef]
- Lancaster, P.; Rodman, L. Algebraic Riccati Equations, Clarendon Press: Oxford, UK, 1995; ISBN 0-19-159125-4.
- Speyer, J.L.; Chung, W.H. Stochastic Processes, Estimation, and Control, 1st ed.; Advances in Design and Control; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2008; ISBN 978-0-89871-655-9. [Google Scholar]
- Chen, C.-W.; Huang, J.-K. Adaptive State Estimation for Control of Flexible Structures. In Proceedings of the Advances in Optical Structure Systems, Orlando, FL, USA, 16–20 April 1990; Breakwell, J.A., Genberg, V.L., Krumweide, G.C., Eds.; SPIE: Bellingham, WA, USA, 1990; p. 252. [Google Scholar]
- Lewis, F.L.; Vrabie, D.; Vamvoudakis, K.G. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers. IEEE Control Syst. 2012, 32, 76–105. [Google Scholar] [CrossRef]
- Malla, N.; Ni, Z. A New History Experience Replay Design for Model-Free Adaptive Dynamic Programming. Neurocomputing 2017, 266, 141–149. [Google Scholar] [CrossRef]
- Song, R.; Lewis, F.L.; Wei, Q.; Zhang, H. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems with Disturbances. IEEE Trans. Cybern. 2016, 46, 1041–1050. [Google Scholar] [CrossRef] [PubMed]
- Bian, T.; Jiang, Y.; Jiang, Z.-P. Adaptive Dynamic Programming for Stochastic Systems with State and Control Dependent Noise. IEEE Trans. Autom. Control. 2016, 61, 4170–4175. [Google Scholar] [CrossRef]
- Li, M.; Qin, J.; Zheng, W.X.; Wang, Y.; Kang, Y. Model-Free Design of Stochastic LQR Controller from Reinforcement Learning and Primal-Dual Optimization Perspective. arXiv 2021, arXiv:2103.09407. [Google Scholar]
- Lu, Y.; Fan, D.; Zhang, Z. Theoretical and Experimental Determination of Bandwidth for a Two-Axis Fast Steering Mirror. Optik 2013, 124, 2443–2449. [Google Scholar] [CrossRef]
Controller | 5 Hz | 15 Hz | 30 Hz | 100 Hz | 150 Hz |
---|---|---|---|---|---|
Reference jitter | 3.537 | 3.537 | 3.537 | 3.537 | 3.537 |
Integrator | 0.747 | 2.344 | 5.567 | 2.429 | 3.798 |
Noisy-OPFB Q-learning | 0.058 | 0.069 | 0.067 | 0.072 | 0.069 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, S.; Cheng, T.; Gao, Z.; Kong, L.; Wang, S.; Yang, P. Reinforcement Learning-Based Tracking Control under Stochastic Noise and Unmeasurable State for Tip–Tilt Mirror Systems. Photonics 2024, 11, 927. https://doi.org/10.3390/photonics11100927
Guo S, Cheng T, Gao Z, Kong L, Wang S, Yang P. Reinforcement Learning-Based Tracking Control under Stochastic Noise and Unmeasurable State for Tip–Tilt Mirror Systems. Photonics. 2024; 11(10):927. https://doi.org/10.3390/photonics11100927
Chicago/Turabian StyleGuo, Sicheng, Tao Cheng, Zeyu Gao, Lingxi Kong, Shuai Wang, and Ping Yang. 2024. "Reinforcement Learning-Based Tracking Control under Stochastic Noise and Unmeasurable State for Tip–Tilt Mirror Systems" Photonics 11, no. 10: 927. https://doi.org/10.3390/photonics11100927
APA StyleGuo, S., Cheng, T., Gao, Z., Kong, L., Wang, S., & Yang, P. (2024). Reinforcement Learning-Based Tracking Control under Stochastic Noise and Unmeasurable State for Tip–Tilt Mirror Systems. Photonics, 11(10), 927. https://doi.org/10.3390/photonics11100927