Next Article in Journal
A Quantal Response Statistical Equilibrium Model of Induced Technical Change in an Interactive Factor Market: Firm-Level Evidence in the EU Economies
Next Article in Special Issue
A Joint Fault Diagnosis Scheme Based on Tensor Nuclear Norm Canonical Polyadic Decomposition and Multi-Scale Permutation Entropy for Gears
Previous Article in Journal / Special Issue
Multivariate Entropy Characterizes the Gene Expression and Protein-Protein Networks in Four Types of Cancer
Article Menu
Issue 3 (March) cover image

Export Article

Open AccessArticle
Entropy 2018, 20(3), 155; https://doi.org/10.3390/e20030155

An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits

1
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
2
Computational NeuroEngineering Laboratory (CNEL), University of Florida, Gainesville, FL 32611, USA
3
Department of Biomedical Engineering, University of Florida, Gainesville, FL 32611, USA
*
Authors to whom correspondence should be addressed.
Received: 12 October 2017 / Revised: 16 February 2018 / Accepted: 26 February 2018 / Published: 28 February 2018
(This article belongs to the Special Issue Entropy in Signal Analysis)
Full-Text   |   PDF [11452 KB, uploaded 1 March 2018]   |  

Abstract

In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to a regret that is logarithmic with respect to the number of arm pulls. View Full-Text
Keywords: multi-armed bandits; exploration; exploitation; exploration-exploitation dilemma; reinforcement learning; information theory multi-armed bandits; exploration; exploitation; exploration-exploitation dilemma; reinforcement learning; information theory
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material

SciFeed

Share & Cite This Article

MDPI and ACS Style

Sledge, I.J.; Príncipe, J.C. An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits. Entropy 2018, 20, 155.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top