Supervised Reinforcement Learning via Value Function
AbstractUsing expert samples to improve the performance of reinforcement learning (RL) algorithms has become one of the focuses of research nowadays. However, in different application scenarios, it is hard to guarantee both the quantity and quality of expert samples, which prohibits the practical application and performance of such algorithms. In this paper, a novel RL decision optimization method is proposed. The proposed method is capable of reducing the dependence on expert samples via incorporating the decision-making evaluation mechanism. By introducing supervised learning (SL), our method optimizes the decision making of the RL algorithm by using demonstrations or expert samples. Experiments are conducted in Pendulum and Puckworld scenarios to test the proposed method, and we use representative algorithms such as deep Q-network (DQN) and Double DQN (DDQN) as benchmarks. The results demonstrate that the method adopted in this paper can effectively improve the decision-making performance of agents even when the expert samples are not available. View Full-Text
Share & Cite This Article
Pan, Y.; Zhang, J.; Yuan, C.; Yang, H. Supervised Reinforcement Learning via Value Function. Symmetry 2019, 11, 590.
Pan Y, Zhang J, Yuan C, Yang H. Supervised Reinforcement Learning via Value Function. Symmetry. 2019; 11(4):590.Chicago/Turabian Style
Pan, Yaozong; Zhang, Jian; Yuan, Chunhui; Yang, Haitao. 2019. "Supervised Reinforcement Learning via Value Function." Symmetry 11, no. 4: 590.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.