# On Gap-Based Lower Bounding Techniques for Best-Arm Identification

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Overview of Results

#### 2.1. Problem Setup

- There are M arms with Bernoulli rewards; the means are $\mathbf{p}=({p}_{1},{p}_{2},\cdots ,{p}_{M})$, and this set of means is said to define the bandit instance. Our analysis will consider instances with arms sorted such that ${p}_{1}\ge {p}_{2}\cdots \ge {p}_{M}$, without loss of generality.
- The agent would like to find an arm whose arm mean is within $\u03f5$ of the highest arm mean for some $0<\epsilon <1$, i.e., ${p}_{l}>{p}_{1}-\epsilon $. Even if there are multiple such arms, just identifying one of them is good enough.
- In each round, the agent can pull any arm $l\in \left[M\right]$ and observe an reward ${X}_{l}^{\left(s\right)}\sim Bernoulli\left({p}_{l}\right)$, where s is the number of times the l-th arm has been pulled so far. We assume that the rewards are independent, both across arms and across times.
- In each round, the agent can alternatively choose to terminate and output an arm index $\widehat{l}$ believed to be $\u03f5$-optimal. The index at which this occurs is denoted by T, and is a random variable because it is allowed to depend on the rewards observed. We are interested in the expected number of arm pulls (also called the sample complexity) ${\mathbb{E}}_{\mathbf{p}}\left[T\right]$ for a given instance $\mathbf{p}$, which should ideally be as low as possible.
- An algorithm is said to be $(\epsilon ,\delta )$-PAC (Probably Approximately Correct) if, for all bandit instances, it outputs an $\epsilon $-optimal arm with probability at least $1-\delta $ when it terminates at the stopping time T.

#### 2.2. Existing Lower Bounds

#### 2.3. Our Result and Discussion

**Theorem**

**1.**

## 3. Proof of Theorem 1

**Lemma**

**1.**

**Proof.**

**Proposition**

**1.**

**Proof.**

**Lemma**

**2.**

**Proof.**

## 4. Conclusion

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Proof of Lemma 1 (Constant-Probability Event for Small Enough ${\mathbb{E}}_{\mathbf{1}}$[${\mathit{G}}_{\mathbf{1},\mathit{l}}$])

**Lemma**

**A1.**

- (A30) uses the definitions of ${C}_{l}$ and ${G}_{2,l}$;
- (A33) uses the definitions of ${U}_{l}$ and ${A}_{l}$;
- (A34) follows from the definitions of ${U}_{l}$ and ${V}_{l}\left({t}_{l}\right)$ in (A23) and (A24) (which imply ${U}_{l}={V}_{l}\left({T}_{l}\right)$);
- (A35) follows from (A26);

- (A37) follows from (A29);
- (A38) follows from the definition of ${n}_{l}$;
- (A40) follows since the condition $\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\le \frac{1}{2}$ in $\mathcal{V}$ yields $\frac{\epsilon +{p}_{*}-{p}_{l}}{{p}_{l}}\le \frac{1}{2}$, which implies$$\begin{array}{c}\hfill {p}_{l}\ge \frac{2}{3}({p}_{*}+\epsilon ).\end{array}$$
- (A42) follows from the definition of ${\nu}_{l}$ in (31);
- (A43) follows from the definition of $\xi $ in (15).

## Appendix B. Proof of Proposition 1 (Bounding a Likelihood Ratio)

**Lemma**

**A2.**

**Case****1:**$\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}>\frac{1}{2}$. In this case, recalling that ${\Delta}_{l}={p}_{*}-{p}_{l}$, we have$$\begin{array}{cc}\hfill \frac{\epsilon +{p}_{*}}{{p}_{l}}& =\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}+1>\frac{3}{2}>1.\hfill \end{array}$$On the other hand, since $\epsilon +{p}_{l}\le \epsilon +{p}_{*}<1$, we have$$\begin{array}{}\mathrm{(A52)}& \hfill 0<\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}& =\frac{\epsilon +{p}_{*}-{p}_{l}}{1-{p}_{l}}\hfill \mathrm{(A53)}& & =1-\frac{1-({p}_{*}+\epsilon )}{1-{p}_{l}},\hfill \end{array}$$$$\begin{array}{}\mathrm{(A54)}& \hfill \frac{1-\epsilon -{p}_{*}}{1-{p}_{l}}& =1-\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\hfill \mathrm{(A55)}& & \ge exp\left(\right)open="["\; close="]">-\left(\sqrt{\frac{1-{p}_{l}}{1-({p}_{*}+\epsilon )}}\right)\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)\hfill \end{array}$$Moreover, by the definition of ${\alpha}_{l}$ in (25), we have$$\begin{array}{c}\hfill {\alpha}_{l}=\frac{\epsilon +{\Delta}_{l}}{(1-{p}_{l}){p}_{l}}>\frac{1}{2(1-{p}_{l})},\end{array}$$$$\begin{array}{c}\hfill {\alpha}_{l}<2{\alpha}_{l}^{2}(1-{p}_{l}).\end{array}$$In addition, again using $\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}>\frac{1}{2}$, we have$$\begin{array}{c}\hfill {p}_{l}<2(\epsilon +{\Delta}_{l}),\end{array}$$$$\begin{array}{}\mathrm{(A60)}& \hfill {p}_{*}+\epsilon \phantom{\rule{1.em}{0ex}}& =(\epsilon +{\Delta}_{l})+{p}_{l}\hfill \mathrm{(A61)}& & <3(\epsilon +{\Delta}_{l}).\hfill \end{array}$$We can now lower bound the likelihood ratio ${L}_{l}\left(W\right)$ as follows:$$\begin{array}{}\mathrm{(A62)}& \hfill {L}_{l}\left(W\right)& ={\left(\frac{\epsilon +{p}_{*}}{{p}_{l}}\right)}^{{K}_{l}}{\left(\frac{1-\epsilon -{p}_{*}}{1-{p}_{l}}\right)}^{{T}_{l}-{K}_{l}}\hfill \mathrm{(A63)}& & \ge exp\left(\right)open="("\; close=")">-\frac{\epsilon +{\Delta}_{l}}{\sqrt{(1-{p}_{l})(1-({p}_{*}+\epsilon ))}}({T}_{l}-{K}_{l})\hfill \end{array}\mathrm{(A65)}& & =exp\left(\right)open="("\; close=")">-\frac{({p}_{*}+\epsilon )(\epsilon +{\Delta}_{l})}{({p}_{*}+\epsilon )\sqrt{(1-{p}_{l})(1-({p}_{*}+\epsilon ))}}{T}_{l}\hfill $$**Case****2:**$0\le \frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\le \frac{1}{2}$. For this case, we have$$\begin{array}{}\mathrm{(A68)}& \hfill {L}_{l}\left(W\right)& ={\left(\frac{\epsilon +{p}_{*}}{{p}_{l}}\right)}^{{K}_{l}}{\left(\frac{1-\epsilon -{p}_{*}}{1-{p}_{l}}\right)}^{{T}_{l}-{K}_{l}}\hfill \mathrm{(A69)}& & ={\left(\right)}^{1}{K}_{l}{\left(\right)}^{1}{T}_{l}-{K}_{l}\hfill \end{array}\mathrm{(A70)}& & ={\left(\right)}^{1}{K}_{l}{\left(\right)}^{1}-{K}_{l}\hfill & {\left(\right)}^{1}{T}_{l}-{K}_{l}$$From (A53), we have$$\begin{array}{c}\hfill 0<{\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)}^{2}={\left(\right)}^{1}2\le 1-\frac{1-({p}_{*}+\epsilon )}{1-{p}_{l}}1.\end{array}$$Hence, by Lemma A2, we have$$\begin{array}{}\mathrm{(A73)}& \hfill 1-{\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)}^{2}& \ge exp\left(\right)open="["\; close="]">-\frac{1}{\sqrt{1-{\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)}^{2}}}{\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)}^{2}\hfill \end{array}\mathrm{(A75)}& & =exp\left(\right)open="["\; close="]">-\left(\frac{1-{p}_{l}}{\sqrt{(1-{p}_{l})(1-{p}_{*}-\epsilon )}}\right){\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)}^{2},\hfill $$For the third term in (A71), we proceed as follows:$$\begin{array}{}& {\left(\right)}^{1}{K}_{l}(1-{p}_{l})/{p}_{l}\hfill \end{array}$$On the other hand, observe that$$\begin{array}{c}\hfill {\left(\right)}^{1}-{K}_{l}\ge exp\left(\right)open="["\; close="]">\left(\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\right){K}_{l}\end{array}$$$$\begin{array}{c}\hfill {\left(\right)}^{1}-{K}_{l}{\left(\right)}^{1}{K}_{l}(1-{p}_{l})/{p}_{l}& \ge exp\left(\right)open="["\; close="]">-\frac{3{\alpha}_{l}^{2}{p}_{l}^{2}{(1-{p}_{l})}^{2}}{2({p}_{*}+\epsilon )\sqrt{(1-{p}_{l})(1-({p}_{*}+\epsilon ))}}{T}_{l}\\ ,\end{array}$$$$\begin{array}{cc}\hfill {L}_{l}\left(W\right)& \ge {\left(\right)}^{1}{K}_{l}{\left(\right)}^{1}({p}_{l}{T}_{l}-{K}_{l})/{p}_{l}\hfill \end{array}& \phantom{\rule{1.em}{0ex}}\times exp\left(\right)open="["\; close="]">-\frac{3}{2({p}_{*}+\epsilon )\sqrt{(1-{p}_{l})(1-({p}_{*}+\epsilon ))}}{\alpha}_{l}^{2}{p}_{l}^{2}{(1-{p}_{l})}^{2}{T}_{l}.\hfill $$Now, since $0<{\left(\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\right)}^{2}\le \frac{1}{4}=1-\frac{3}{4}$ (since we are in the case $0\le \frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\le \frac{1}{2}$), by Lemma A2, we have$$\begin{array}{}\mathrm{(A87)}& \hfill {\left(\right)}^{1}{K}_{l}\ge exp\left(\right)open="["\; close="]">-\sqrt{\frac{4}{3}}{\left(\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\right)}^{2}{K}_{l}\hfill \end{array}\mathrm{(A88)}& & \ge exp\left(\right)open="["\; close="]">-\frac{4}{3}{\left(\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\right)}^{2}{K}_{l}\hfill $$We now consider two further sub-cases:- (i)
- If ${p}_{l}{T}_{l}>{K}_{l}$, then we have$$\begin{array}{}\mathrm{(A91)}& \hfill {\left(\right)}^{1}({p}_{l}{T}_{l}-{K}_{l})/{p}_{l}\ge exp\left(\right)open="("\; close=")">-\left(\sqrt{\frac{1-{p}_{l}}{1-({p}_{*}+\epsilon )}}\right)\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)\left(\frac{{p}_{l}{T}_{l}-{K}_{l}}{{p}_{l}}\right)\hfill \end{array}\mathrm{(A92)}& & =exp\left(\right)open="("\; close=")">-\left(\sqrt{\frac{1-{p}_{l}}{1-({p}_{*}+\epsilon )}}\right){\alpha}_{l}\left(\right)open="("\; close=")">{p}_{l}{T}_{l}-{K}_{l}\hfill $$
- (ii)
- If ${p}_{l}{T}_{l}\le {K}_{l}$, then we have$$\begin{array}{}\mathrm{(A94)}& \hfill {\left(\right)}^{1}({p}_{l}{T}_{l}-{K}_{l})/{p}_{l}\ge exp\left(\right)open="["\; close="]">-\left(\frac{\epsilon +{\Delta}_{l}}{1-{p}_{l}}\right)\left(\frac{{p}_{l}{T}_{l}-{K}_{l}}{{p}_{l}}\right)\hfill \end{array}\mathrm{(A95)}& & =exp\left(\right)open="["\; close="]">-{\alpha}_{l}({p}_{l}{T}_{l}-{K}_{l}),\hfill $$

From (A93) and (A95), we obtain$$\begin{array}{c}\hfill {\left(\right)}^{1}({p}_{l}{T}_{l}-{K}_{l})/{p}_{l}\ge exp\left(\right)open="["\; close="]">-\left(\right)open="("\; close=")">{\beta}_{l}({p}_{l}{T}_{l}-{K}_{l})\mathbf{1}\{{p}_{l}{T}_{l}{K}_{l}\}+{\alpha}_{l}({p}_{l}{T}_{l}-{K}_{l})\mathbf{1}\{{p}_{l}{T}_{l}\le {K}_{l}\}\\ .\end{array}$$Now, from (A86), (A90), and (A96), we have$$\begin{array}{}& \hfill {L}_{l}\left(W\right)& \ge exp\left(\right)open="["\; close="]">\frac{4}{3}{\left(\right)}^{{\alpha}_{l}}2({p}_{l}{T}_{l}-{K}_{l})\hfill & exp\left(\right)open="["\; close="]">-\frac{4}{3}{\alpha}_{l}^{2}{p}_{l}{(1-{p}_{l})}^{2}{T}_{l}\end{array}$$

## Appendix C. Differences in Analysis Techniques

- We remove the restriction ${p}_{l}\ge \frac{\epsilon +{p}_{*}}{1+\sqrt{\frac{1}{2}}}$ (or $\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\le \frac{1}{\sqrt{2}}$) used in the subsets $\mathcal{M}(\mathbf{p},\epsilon )$ and $\mathcal{N}(\mathbf{p},\epsilon )$ in (Equations (4) and (5) [14]), so that our lower bound depends on all of the arms. To achieve this, our analysis frequently needs to handle the cases $\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}>\frac{1}{2}$ and $\frac{\epsilon +{\Delta}_{l}}{{p}_{l}}\le \frac{1}{2}$ separately (e.g., see the proof of Proposition 1).
- The preceding separation into two cases also introduces further difficulties. For example, our definition of ${G}_{2,l}$ in (30) is modified to contain different constants for the cases ${p}_{l}{T}_{l}>{K}_{l}$ and ${p}_{l}{T}_{l}\le {K}_{l}$, which is not the case in (Lemma 2 [14]). Accordingly, the quantities ${\tilde{\alpha}}_{l}$ in (27) and ${\tilde{\beta}}_{l}$ in (28) appear in our proof but not in [14].
- We replace the inequality ${(1-x)}^{y}\ge {e}^{-1.78xy}$ (for $x\in \left(\right)open="("\; close=")">0,\frac{1}{\sqrt{2}}$ and $y\ge 0$) (Lemma 3 [14]) by Lemma A2. By using this stronger inequality, we can improve the constant term ${c}_{1}$ from $O\left({\underline{p}}^{2}\right)$ to ${({p}^{*}+\epsilon )}^{2}$. In addition, Lemma A2 does not require the assumption $x\le \frac{1}{\sqrt{2}}$ as in (Lemma 3 [14]), so we can use it for the case ${p}_{*}>\frac{1}{2}$, which required a separate analysis in [14].
- To further reduce the constant term from ${({p}^{*}+\epsilon )}^{2}$ to $({p}^{*}+\epsilon )$ (see Theorem 1), we also need to use other mathematical tricks to sharpen certain inequalities, such as (A83).

## References

- Lattimore, T.; Szepesvári, C. Bandit Algorithms; Cambridge University Press: Cambridge, UK, to appear.
- Villar, S.S.; Bowden, J.; Wason, J. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Stat. Sci.
**2015**, 30, 199–215. [Google Scholar] [CrossRef] [PubMed] - Li, L.; Chu, W.; Langford, J.; Schapire, R.E. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010. [Google Scholar]
- Awerbuch, B.; Kleinberg, R.D. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Symposium of Theory of Computing (STOC04), Chicago, IL, USA, 5–8 June 2004. [Google Scholar]
- Shen, W.; Wang, J.; Jiang, Y.G.; Zha, H. Portfolio Choices with Orthogonal Bandit Learning. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI-15), Bengaluru, India, 25–31 July 2015. [Google Scholar]
- Bechhofer, R.E. A sequential multiple-decision procedure for selecting the best one of several normal populations with a common unknown variance, and its use with various experimental designs. Biometrics
**1958**, 14, 408–429. [Google Scholar] [CrossRef] - Paulson, E. A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Stat.
**1964**, 35, 174–180. [Google Scholar] [CrossRef] - Even-Dar, E.; Mannor, S.; Mansour, Y. PAC bounds for multi-armed bandit and Markov decision processes. In Proceedings of the Fifteenth Annual Conference on Computational Learning Theory, Sydney, Australia, 8–10 July 2002. [Google Scholar]
- Kalyanakrishnan, S.; Tewari, A.; Auer, P.; Stone, P. PAC subset selection in stochastic multi-armed bandits. In Proceedings of the International Conference on Machine Learning, Edinburgh, UK, 26 June–1 July 2012. [Google Scholar]
- Gabillon, V.; Ghavamzadeh, M.; Lazaric, A. Best arm identification: A unified approach to fixed budget and fixed confidence. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
- Jamieson, K.; Malloy, M.; Nowak, R.; Bubeck, S. On finding the largest mean among many. arXiv
**2013**, arXiv:1306.3917. [Google Scholar] - Karnin, Z.; Koren, T.; Somekh, O. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on Machine Learning (ICML 2013), Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Jamieson, K.; Malloy, M.; Nowak, R.; Bubeck, S. lil’UCB: An Optimal Exploration Algorithm for Multi-Armed Bandits. arXiv
**2013**, arXiv:1312.7308. [Google Scholar] - Mannor, S.; Tsitsiklis, J.N. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. J. Mach. Learn. Res.
**2004**, 5, 623–648. [Google Scholar] - Kaufmann, E.; Cappé, O.; Garivier, A. On the Complexity of Best-arm Identification in Multi-armed Bandit Models. J. Mach. Learn. Res.
**2016**, 17, 1–42. [Google Scholar] - Carpentier, A.; Locatelli, A. Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem. In Proceedings of the Conference On Learning Theory, New York, NY, USA, 23–26 June 2016. [Google Scholar]
- Chen, L.; Li, J.; Qiao, M. Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
- Simchowitz, M.; Jamieson, K.G.; Recht, B. The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime. arXiv
**2013**, arXiv:abs/1702.05186. [Google Scholar] - Bubeck, S.; Bianchi, N.C. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. In Foundations and Trends in Machine Learning; Now Publishers Inc.: Hanover, MA, USA, 2012; Volume 5. [Google Scholar]
- Royden, H.; Fitzpatrick, P. Real Analysis, 4th ed.; Pearson: New York, NY, USA, 2010. [Google Scholar]
- Katariya, S.; Jain, L.; Sengupta, N.; Evans, J.; Nowak, R. Adaptive Sampling for Coarse Ranking. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain, 9–11 April 2018. [Google Scholar]
- Billingsley, P. Probability and Measure, 3rd ed.; Wiley-Interscience: Hoboken, NJ, USA, 1995. [Google Scholar]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Truong, L.V.; Scarlett, J.
On Gap-Based Lower Bounding Techniques for Best-Arm Identification. *Entropy* **2020**, *22*, 788.
https://doi.org/10.3390/e22070788

**AMA Style**

Truong LV, Scarlett J.
On Gap-Based Lower Bounding Techniques for Best-Arm Identification. *Entropy*. 2020; 22(7):788.
https://doi.org/10.3390/e22070788

**Chicago/Turabian Style**

Truong, Lan V., and Jonathan Scarlett.
2020. "On Gap-Based Lower Bounding Techniques for Best-Arm Identification" *Entropy* 22, no. 7: 788.
https://doi.org/10.3390/e22070788