A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index
Abstract
:1. Introduction
Structure of the Paper
2. Review of Related Literature
3. Review of Finite-State Restless Bandit Whittle Indexation via PCL-Indexability
3.1. SMDP Restless Bandits and Their Discrete-Stage Reformulation
3.2. Indexability, Whittle Index, and the Achievable Resource–Reward Performance Region
3.3. Indexability
3.4. PCL-Indexability and Adaptive-Greedy Algorithm
- (i)
- For every nonempty , is nonempty.
- (ii)
- For every , is nonempty.
Algorithm 1: Adaptive-greedy algorithm: top-down version . |
Output: S0 := ∅ tondo pick
; end { for } |
Algorithm 2: Adaptive-greedy algorithm: bottom-up version . |
Output: tondo pick
; end { for } |
- (i)
- for every active set , for each ; and
- (ii)
- algorithm computes a monotone non-increasing index sequence ; or algorithm computes a monotone non-decreasing index sequence .
Algorithm 3: Reformulated adaptive-greedy index algorithm: top-down version . |
Output: S0 := ∅ tondo pick
; end { for } |
Algorithm 4: Reformulated adaptive-greedy index algorithm: bottom-up version . |
Output: tondo pick
; end { for } |
4. Laying the Groundwork for an Efficient Implementation of the Adaptive-Greedy Algorithm
4.1. Optimality Equations and Parametric LP Formulation
4.2. Bases, Basic Feasible Solutions, and Reduced Costs
- (a)
- and .
- (b)
- .
- (c)
- .
- (d)
- .
- (e)
- .
- (a)
- .
- (b)
- .
- (c)
- .
- (a)
- For , and .
- (b)
- For , and .
5. A Fast-Pivoting Index Algorithm for PCL-Indexable Projects
Algorithm 5: The fast-pivoting adaptive-greedy index algorithm . |
Output: solve ; tondo pick if then ; end { if } ; end { for } |
5.1. Computing the Initial Tableau
6. Extension to the Average Criterion
7. Numerical Experiments
Comparing Runtimes of Index Algorithms
8. Discussion
9. Conclusions
- -
- A new algorithm to compute the Whittle index of a general n-state semi-Markov restless bandit is presented, which can also be used to test numerically for indexability. After an initialization step, the algorithm computes the n index values in an n-step loop with a complexity of arithmetic operations.
- -
- The algorithm extends to Whittle’s index the fast-pivoting algorithm introduced by the author in [30] for the Gittins index of classic (non-restless) banditss, which also has the lowest complexity to date.
- -
- The proposed algorithm has substantially better complexity than alternative algorithms proposed in the literature.
- -
- The algorithm will be especially useful for computing the Whittle index in large-scale multi-dimensional models where the index cannot be derived in closed form and alternative algorithms will result in prohibitive computation times.
Funding
Acknowledgments
Conflicts of Interest
References
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; Wiley: New York, NY, USA, 1994. [Google Scholar]
- Whittle, P. Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 1988, 25A, 287–298. [Google Scholar] [CrossRef]
- Niño-Mora, J. Dynamic allocation indices for restless projects and queueing admission control: A polyhedral approach. Math. Program. 2002, 93, 361–413. [Google Scholar] [CrossRef]
- Niño-Mora, J. Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 2006, 31, 50–84. [Google Scholar] [CrossRef]
- Fu, J.; Moran, B.; Guo, J.; Wong, E.W.M.; Zukerman, M. Asymptotically optimal job assignment for energy-efficient processor-sharing server farms. IEEE J. Sel. Areas Commun. 2016, 34, 4008–4023. [Google Scholar] [CrossRef]
- Niño-Mora, J. Computing an index policy for bandits with switching penalties. In Proceedings of the 2nd International Conference on Performance Evaluation Methodologies and Tools, Nantes, France, 22–27 October 2007. [Google Scholar] [CrossRef]
- Qian, Y.; Zhang, C.; Krishnamachari, B.; Tambe, M. Restless poachers: Handling exploration-exploitation tradeoffs in security domains. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore, 9–13 May 2016; pp. 123–131. [Google Scholar]
- Borkar, V.S.; Pattathil, S. Whittle indexability in egalitarian processor sharing systems. Ann. Oper. Res. 2017, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Borkar, V.S.; Ravikumar, K.; Saboo, K. An index policy for dynamic pricing in cloud computing under price commitments. Appl. Math. 2017, 44, 215–245. [Google Scholar] [CrossRef]
- Borkar, V.S.; Kasbekar, G.S.; Pattathil, S.; Shetty, P.Y. Opportunistic scheduling as restless bandits. IEEE Trans. Control Netw. Syst. 2018, 5, 1952–1961. [Google Scholar] [CrossRef] [Green Version]
- Gerum, P.C.L.; Altay, A.; Baykal-Gursoy, M. Data-driven predictive maintenance scheduling policies for railways. Transp. Res. Part C Emerg. Technol. 2019, 107, 137–154. [Google Scholar] [CrossRef]
- Abbou, A.; Makis, V. Group maintenance: A restless bandits approach. INFORMS J. Comput. 2019, 31, 719–731. [Google Scholar] [CrossRef]
- Ayer, T.; Zhang, C.; Bonifonte, A.; Spaulding, A.C.; Chhatwal, J. Prioritizing hepatitis C treatment in US prisons. Oper. Res. 2019, 67, 853–873. [Google Scholar] [CrossRef] [Green Version]
- Niño-Mora, J. Resource allocation and routing in parallel multi-server queues with abandonments for cloud profit maximization. Comput. Oper. Res. 2019, 103, 221–236. [Google Scholar] [CrossRef]
- Fu, J.; Moran, B. Energy-efficient job-assignment policy with asymptotically guaranteed performance deviation. IEEE/ACM Trans. Netw. 2020, 28, 1325–1338. [Google Scholar] [CrossRef] [Green Version]
- Hsu, Y.P.; Modiano, E.; Duan, L.J. Scheduling algorithms for minimizing age of information in wireless broadcast networks with random arrivals. IEEE Trans. Mob. Comput. 2020, 19, 2903–2915. [Google Scholar] [CrossRef]
- Sun, J.Z.; Jiang, Z.Y.; Krishnamachari, B.; Zhou, S.; Niu, Z.S. Closed-form Whittle’s index-enabled random access for timely status update. IEEE Trans. Commun. 2020, 68, 1538–1551. [Google Scholar] [CrossRef]
- Li, D.; Ding, L.; Connor, S. When to switch? Index policies for resource scheduling in emergency response. Prod. Oper. Manag. 2020, 29, 241–262. [Google Scholar] [CrossRef]
- Papadimitriou, C.H.; Tsitsiklis, J.N. The complexity of optimal queuing network control. Math. Oper. Res. 1999, 24, 293–305. [Google Scholar] [CrossRef] [Green Version]
- Weber, R.R.; Weiss, G. On an index policy for restless bandits. J. Appl. Probab. 1990, 27, 637–648. [Google Scholar] [CrossRef]
- Weber, R.R.; Weiss, G. Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Probab. 1991, 23, 429–430. [Google Scholar] [CrossRef] [Green Version]
- Ouyang, W.Z.; Eryilmaz, A.; Shroff, N.B. Downlink scheduling over Markovian fading channels. IEEE/ACM Trans. Netw. 2016, 24, 1801–1812. [Google Scholar] [CrossRef] [Green Version]
- Verloop, I.M. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 2016, 26, 1947–1995. [Google Scholar] [CrossRef]
- Niño-Mora, J. Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 2001, 33, 76–98. [Google Scholar] [CrossRef] [Green Version]
- Niño-Mora, J. A verification theorem for threshold-indexability of real-state discounted restless bandits. Math. Oper. Res. 2020, 45, 465–496. [Google Scholar] [CrossRef]
- Niño-Mora, J. Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue. Queueing Syst. 2006, 54, 281–312. [Google Scholar] [CrossRef] [Green Version]
- Niño-Mora, J. Dynamic priority allocation via restless bandit marginal productivity indices. Top 2007, 15, 161–198. [Google Scholar] [CrossRef]
- Cao, J.; Nyberg, C. Linear programming relaxations and marginal productivity index policies for the buffer sharing problem. Queueing Syst. 2008, 60, 247–269. [Google Scholar] [CrossRef]
- Huberman, B.A.; Wu, F. The economics of attention: Maximizing user value in information-rich environments. Adv. Complex Syst. 2008, 11, 487–496. [Google Scholar] [CrossRef] [Green Version]
- Niño-Mora, J. A faster index algorithm and a computational study for bandits with switching costs. INFORMS J. Comput. 2008, 20, 255–269. [Google Scholar] [CrossRef]
- Niño-Mora, J. Admission and routing of soft real-time jobs to multiclusters: Design and comparison of index policies. Comput. Oper. Res. 2012, 39, 3431–3444. [Google Scholar] [CrossRef]
- Niño-Mora, J. Towards minimum loss job routing to parallel heterogeneous multiserver queues via index policies. Eur. J. Oper. Res. 2012, 220, 705–715. [Google Scholar] [CrossRef]
- He, T.; Chen, S.; Kim, H.; Tong, L.; Lee, K.W. Scheduling parallel tasks onto opportunistically available cloud resources. In Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI, USA, 24–29 June 2012; pp. 180–187. [Google Scholar]
- Menner, M.; Zeilinger, M.N. A user comfort model and index policy for personalizing discrete controller decisions. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 1759–1765. [Google Scholar]
- Dance, C.R.; Silander, T. Optimal policies for observing time series and related restless bandit problems. J. Mach. Learn. Res. 2019, 20, 35. [Google Scholar]
- Klimov, G.P. Time-sharing service systems. I. Theory Probab. Appl. 1974, 19, 532–551. [Google Scholar] [CrossRef]
- Bertsimas, D.; Niño-Mora, J. Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 1996, 21, 257–306. [Google Scholar] [CrossRef]
- Niño-Mora, J. Conservation laws and related applications. In Wiley Encyclopedia of Operations Research and Management Science; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Niño-Mora, J. Klimov’s model. In Wiley Encyclopedia of Operations Research and Management Science; Cochran, J.J., Ed.; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Niño-Mora, J. A (2/3)n3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS J. Comput. 2007, 19, 596–606. [Google Scholar] [CrossRef] [Green Version]
- Ayesta, U.; Gupta, M.K.; Verloop, I.M. On the computation of Whittle’s index for Markovian restless bandits. Math. Methods Oper. Res. 2020, 1–30. [Google Scholar] [CrossRef]
- Zhao, Q. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks; Morgan & Claypool: San Rafael, CA, USA, 2020. [Google Scholar]
- Varaiya, P.P.; Walrand, J.C.; Buyukkoc, C. Extensions of the multiarmed bandit problem: The discounted case. IEEE Trans. Automat. Control 1985, 30, 426–439. [Google Scholar] [CrossRef]
- Chen, Y.R.; Katehakis, M.N. Linear programming for finite state multi-armed bandit problems. Math. Oper. Res. 1986, 11, 180–183. [Google Scholar] [CrossRef]
- Kallenberg, L.C.M. Computation of the Gittins index. Math. Oper. Res. 1986, 11, 184–186. [Google Scholar] [CrossRef]
- Katehakis, M.N.; Veinott, A.F., Jr. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 1987, 12, 262–268. [Google Scholar] [CrossRef]
- Katta, A.K.; Sethuraman, J. The multi-armed bandit problem: Decomposition and computation. SIAM J. Discret. Math. 2004, 18, 110–113. [Google Scholar] [CrossRef] [Green Version]
- Sonin, I.M. A generalized Gittins index for a Markov chain and its recursive calculation. Stat. Probab. Lett. 2008, 78, 1526–1533. [Google Scholar] [CrossRef]
- Gittins, J.C. Bandit processes and dynamic allocation indices. J. R. Stat. Soc. Ser. B 1979, 41, 148–177. [Google Scholar] [CrossRef] [Green Version]
- Larrañaga, M.; Ayesta, U.; Verloop, I.M. Dynamic control of birth-and-death restless bandits: Application to resource-allocation problems. IEEE/ACM Trans. Netw. 2016, 24, 3812–3825. [Google Scholar] [CrossRef]
- Kallenberg, L.C.M. Finite state and action MDPs. In Handbook of Markov Decision Processes; Kluwer: Boston, MA, USA, 2002; pp. 21–87. [Google Scholar]
- Gass, S.; Saaty, T. The computational algorithm for the parametric objective function. Nav. Res. Logist. Q. 1955, 2, 39–45. [Google Scholar] [CrossRef]
- Niño-Mora, J. Characterization and computation of restless bandit marginal productivity indices. In SMCtools ’07: Proceedings from the Second International Workshop on Tools for Solving Structured Markov Chains; ICST: Brussels, Belgium, 2007. [Google Scholar]
- Niño-Mora, J. Characterization and computation of restless bandit marginal productivity indices. Universidad Carlos III de Madrid (UC3M) Working Papers, Statistics and Econometrics 07–11. 2007. Available online: http://hdl.handle.net/10016/796 (accessed on 15 November 2020).
n | FP | RP | CP |
---|---|---|---|
1000 | |||
2000 | |||
3000 | |||
4000 | |||
5000 | |||
6000 | |||
7000 | |||
8000 | |||
9000 | |||
10,000 | |||
11,000 | |||
12,000 | |||
13,000 | |||
14,000 | |||
15,000 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Niño-Mora, J. A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index. Mathematics 2020, 8, 2226. https://doi.org/10.3390/math8122226
Niño-Mora J. A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index. Mathematics. 2020; 8(12):2226. https://doi.org/10.3390/math8122226
Chicago/Turabian StyleNiño-Mora, José. 2020. "A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index" Mathematics 8, no. 12: 2226. https://doi.org/10.3390/math8122226
APA StyleNiño-Mora, J. (2020). A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index. Mathematics, 8(12), 2226. https://doi.org/10.3390/math8122226