In this paper, the Q-learning method for quadratic optimal control problem of discrete-time linear systems is reconsidered. The theoretical results prove that the quadratic optimal controller cannot be solved directly due to the linear correlation of the data sets. The following corollaries have been made: (1) The correlation of data is the key factor in the success for the calculation of quadratic optimal control laws by Q-learning method; (2) The control laws for linear systems cannot be derived directly by the existing Q-learning method; (3) For nonlinear systems, there are some doubts about the data independence of current method. Therefore, it is necessary to discuss the probability of the controllers established by the existing Q-learning method. To solve this problem, based on the ridge regression, an improved model-free Q-learning quadratic optimal control method for discrete-time linear systems is proposed in this paper. Therefore, the computation process can be implemented correctly, and the effective controller can be solved. The simulation results show that the proposed method can not only overcome the problem caused by the data correlation, but also derive proper control laws for discrete-time linear systems.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited