Next Article in Journal
A Quantal Response Statistical Equilibrium Model of Induced Technical Change in an Interactive Factor Market: Firm-Level Evidence in the EU Economies
Next Article in Special Issue
A Joint Fault Diagnosis Scheme Based on Tensor Nuclear Norm Canonical Polyadic Decomposition and Multi-Scale Permutation Entropy for Gears
Previous Article in Journal / Special Issue
Multivariate Entropy Characterizes the Gene Expression and Protein-Protein Networks in Four Types of Cancer
Article

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

by 1,2,* and 1,2,3,*
1
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
2
Computational NeuroEngineering Laboratory (CNEL), University of Florida, Gainesville, FL 32611, USA
3
Department of Biomedical Engineering, University of Florida, Gainesville, FL 32611, USA
*
Authors to whom correspondence should be addressed.
Entropy 2018, 20(3), 155; https://doi.org/10.3390/e20030155
Received: 12 October 2017 / Revised: 16 February 2018 / Accepted: 26 February 2018 / Published: 28 February 2018
(This article belongs to the Special Issue Entropy in Signal Analysis)
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. View Full-Text
Keywords: multi-armed bandits; exploration; exploitation; exploration-exploitation dilemma; reinforcement learning; information theory multi-armed bandits; exploration; exploitation; exploration-exploitation dilemma; reinforcement learning; information theory
Show Figures

Figure 1

MDPI and ACS Style

Sledge, I.J.; Príncipe, J.C. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits. Entropy 2018, 20, 155. https://doi.org/10.3390/e20030155

AMA Style

Sledge IJ, Príncipe JC. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits. Entropy. 2018; 20(3):155. https://doi.org/10.3390/e20030155

Chicago/Turabian Style

Sledge, Isaac J., and José C. Príncipe. 2018. "An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits" Entropy 20, no. 3: 155. https://doi.org/10.3390/e20030155

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop