Next Article in Journal
Hamilton–Jacobi Wave Theory in Manifestly-Covariant Classical and Quantum Gravity
Previous Article in Journal
Corn Classification System based on Computer Vision
Article Menu
Issue 4 (April) cover image

Export Article

Open AccessArticle

Supervised Reinforcement Learning via Value Function

Space Engineering University, 81 Road, Huairou District, Beijing 101400, China
Author to whom correspondence should be addressed.
Symmetry 2019, 11(4), 590;
Received: 21 March 2019 / Revised: 15 April 2019 / Accepted: 22 April 2019 / Published: 24 April 2019
PDF [2134 KB, uploaded 24 April 2019]


Using expert samples to improve the performance of reinforcement learning (RL) algorithms has become one of the focuses of research nowadays. However, in different application scenarios, it is hard to guarantee both the quantity and quality of expert samples, which prohibits the practical application and performance of such algorithms. In this paper, a novel RL decision optimization method is proposed. The proposed method is capable of reducing the dependence on expert samples via incorporating the decision-making evaluation mechanism. By introducing supervised learning (SL), our method optimizes the decision making of the RL algorithm by using demonstrations or expert samples. Experiments are conducted in Pendulum and Puckworld scenarios to test the proposed method, and we use representative algorithms such as deep Q-network (DQN) and Double DQN (DDQN) as benchmarks. The results demonstrate that the method adopted in this paper can effectively improve the decision-making performance of agents even when the expert samples are not available. View Full-Text
Keywords: artificial intelligence; reinforcement learning; supervised learning; DQN; DDQN; expert samples; demonstration artificial intelligence; reinforcement learning; supervised learning; DQN; DDQN; expert samples; demonstration

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Pan, Y.; Zhang, J.; Yuan, C.; Yang, H. Supervised Reinforcement Learning via Value Function. Symmetry 2019, 11, 590.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Symmetry EISSN 2073-8994 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top