The authors would like to mention that their paper is an extended version of the IEEE conference paper [1] from the same authors. It builds on the output reference model tracking problem for unknown systems that was also discussed in work [2] but in a different, observability-based, framework.
The abstract is rewritten as follows.
Abstract: This work suggests a solution for the output reference model (ORM) tracking control problem, based on approximate dynamic programming. General nonlinear systems are included in a control system (CS) and subjected to state feedback. By linear ORM selection, indirect CS feedback linearization is obtained, leading to favorable linear behavior of the CS. The Value Iteration (VI) algorithm ensures model-free nonlinear state feedback controller learning, without relying on the process dynamics. From linear to nonlinear parameterizations, a reliable approximate VI implementation in continuous state-action spaces depends on several key parameters such as problem dimension, exploration of the state-action space, the state-transitions dataset size, and a suitable selection of the function approximators. Herein, we find that, given a transition sample dataset and a general linear parameterization of the Q-function, the ORM tracking performance obtained with an approximate VI scheme can reach the performance level of a more general implementation using neural networks (NNs). Although the NN-based implementation takes more time to learn due to its higher complexity (more parameters), it is less sensitive to exploration settings, number of transition samples, and to the selected hyper-parameters, hence it is recommending as the de facto practical implementation. Contributions of this work include the following: VI convergence is guaranteed under general function approximators; a case study for a low-order linear system in order to generalize the more complex ORM tracking validation on a real-world nonlinear multivariable aerodynamic process; comparisons with an offline deep deterministic policy gradient solution; implementation details and further discussions on the obtained results.
The extended results contained in this article, with respect to the results in [1], are detailed at the end of the seventh paragraph of the Introduction Section, as follows:
The main updates with respect to our paper [12] include the following: detailed IMF-AVI convergence proofs under general function approximators; a case study for a low order linear system in order to generalize the more complex ORM tracking validation on the TITOAS process; comparisons with an offline deep deterministic policy gradient solution; more implementation details and insightful discussions on the obtained results.
Additionally, the references [1] and [2] (below) are better acknowledged throughout the revised manuscript as references [12] and [46], respectively.
The manuscript will be updated, and the original one will remain available on the article webpage, with reference to this Addendum. The authors apologize for any inconvenience this change may cause. The changes do not affect the scientific results.
References
- Lala, T.; Radac, M.B. Parameterized value iteration for output reference model tracking of a high order nonlinear aerodynamic system. In Proceedings of the 2019 27th Mediterranean Conference on Control and Automation (MED), Akko, Israel, 1 July–4 July 2019; pp. 43–49. [Google Scholar]
- Radac, M.-B.; Precup, R.-E.; Hedrea, E.-L.; Mituletu, I.-C. Data-driven model-free model-reference nonlinear virtual state feedback control from input-output data. In Proceedings of the 2018 26th Mediterranean Conference on Control and Automation (MED), Zadar, Croatia, 19 June–22 June 2018; pp. 332–338. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).