Next Article in Journal
Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points
Previous Article in Journal
DOPSIE: Deep-Order Proximity and Structural Information Embedding
Open AccessArticle

Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior

1
Department of Mathematical Sciences, Georgia Southern University, Statesboro, GA 30460, USA
2
Air Force Material Command, Robins Air Force Base, GA 31098, USA
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2019, 1(2), 698-714; https://doi.org/10.3390/make1020041
Received: 4 April 2019 / Revised: 13 May 2019 / Accepted: 13 May 2019 / Published: 24 May 2019
(This article belongs to the Section Learning)
In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with-replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority. View Full-Text
Keywords: count-based exploration; without-replacement sampling; stochastic shortest path; reinforcement learning; Markov decision processes count-based exploration; without-replacement sampling; stochastic shortest path; reinforcement learning; Markov decision processes
Show Figures

Graphical abstract

MDPI and ACS Style

Carden, S.W.; Walker, S.D. Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior. Mach. Learn. Knowl. Extr. 2019, 1, 698-714.

Show more citation formats Show less citations formats

Article Access Map by Country/Region

1
Back to TopTop