Next Article in Journal
Effects of Nanoparticle Enhanced Lubricant Films in Thermal Design of Plain Journal Bearings at High Reynolds Numbers
Previous Article in Journal
Skewness of Maximum Likelihood Estimators in the Weibull Censored Data
Previous Article in Special Issue
A Robust Color Image Watermarking Algorithm Based on APDCBT and SSVD
Open AccessArticle

Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning

by Chayoung Kim 1 and JiSu Park 2,*
1
Division of General Studies, Kyonggi University, Suwon, Gyeonggi-do 16227, Korea
2
Convergence Institute, Dongguk University, Seoul 04620, Korea
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(11), 1352; https://doi.org/10.3390/sym11111352
Received: 26 September 2019 / Revised: 20 October 2019 / Accepted: 30 October 2019 / Published: 1 November 2019
(This article belongs to the Special Issue Symmetry-Adapted Machine Learning for Information Security)
In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.
Keywords: deep Q-network (DQN); reinforcement learning (RL); explorations; deep deterministic policy gradient (DDPG); random ε-greedy buffers deep Q-network (DQN); reinforcement learning (RL); explorations; deep deterministic policy gradient (DDPG); random ε-greedy buffers
MDPI and ACS Style

Kim, C.; Park, J. Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning. Symmetry 2019, 11, 1352.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop