Outsmarting Human Design in Airline Revenue Management
Abstract
:1. Introduction
2. Heuristic Methods for Earning While Learning
3. Background
3.1. The Single-Leg Problem
3.2. Reinforcement Learning
4. Revisiting Earning While Learning through Reinforcement Learning
Algorithm 1: Actor–critic for EWL. |
5. Experiments
5.1. Estimating Only the Price Sensitivity under Unconstrained Capacity
5.2. Estimating the Price Sensitivity and Arrival Rate under Constrained Capacity
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fiig, T.; Weatherford, L.R.; Wittman, M.D. Can demand forecast accuracy be linked to airline revenue? J. Revenue Pricing Manag. 2019, 18, 291–305. [Google Scholar] [CrossRef]
- Den Boer, A.V.; Zwart, B. Simultaneously learning and optimizing using controlled variance pricing. Manag. Sci. 2014, 60, 770–783. [Google Scholar] [CrossRef] [Green Version]
- Elreedy, D.; Atiya, A.F.; Shaheen, S.I. Novel pricing strategies for revenue maximization and demand learning using an exploration—Exploitation framework. Soft Comput. 2021, 25, 11711–11733. [Google Scholar] [CrossRef] [PubMed]
- Keskin, N.B.; Zeevi, A. Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 2017, 42, 277–307. [Google Scholar] [CrossRef] [Green Version]
- Kumar, R.; Li, A.; Wang, W. Learning and optimizing through dynamic pricing. J. Revenue Pricing Manag. 2018, 17, 63–77. [Google Scholar] [CrossRef]
- Olsson, F. A Literature Survey of Active Machine Learning in the Context of Natural Language Processing; Swedish Institute of Computer Science: Kista, Sweden, 2009. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Ferreira, K.J.; Simchi-Levi, D.; Wang, H. Online network revenue management using thompson sampling. Oper. Res. 2018, 66, 1586–1602. [Google Scholar] [CrossRef]
- Trovo, F.; Paladino, S.; Restelli, M.; Gatti, N. Multi-armed bandit for pricing. In Proceedings of the 12th European Workshop on Reinforcement Learning, Lille, France, 10–11 July 2015; pp. 1–9. [Google Scholar]
- Degrave, J.; Felici, F.; Buchli, J.; Neunert, M.; Tracey, B.; Carpanese, F.; Ewalds, T.; Hafner, R.; Abdolmaleki, A.; de Las Casas, D.; et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022, 602, 414–419. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Silver, D.; Singh, S.; Precup, D.; Sutton, R.S. Reward is enough. Artif. Intell. 2021, 299, 103535. [Google Scholar] [CrossRef]
- Bondoux, N.; Nguyen, A.Q.; Fiig, T.; Acuna-Agost, R. Reinforcement learning applied to airline revenue management. J. Revenue Pricing Manag. 2020, 19, 332–348. [Google Scholar] [CrossRef]
- Kastius, A.; Schlosser, R. Dynamic pricing under competition using reinforcement learning. J. Revenue Pricing Manag. 2021, 21, 50–63. [Google Scholar] [CrossRef]
- Shihab, S.A.; Wei, P. A deep reinforcement learning approach to seat inventory control for airline revenue management. J. Revenue Pricing Manag. 2021, 21, 183–199. [Google Scholar] [CrossRef]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
- Hansen, B. Report of the Uppsala Meeting, August 2–4, 1954. Econometrica 1955, 23, 198–216. [Google Scholar]
- Hawkins, E.R. Methods of estimating demand. J. Mark. 1957, 21, 428–438. [Google Scholar] [CrossRef]
- Lobo, M.S.; Boyd, S. Pricing and learning with uncertain demand. In Proceedings of the INFORMS Revenue Management Conference, Honolulu, HI, USA, 2–5 June 2003. [Google Scholar]
- Chhabra, M.; Das, S. Learning the demand curve in posted-price digital goods auctions. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan, 2–6 May 2011; Volume 1, pp. 63–70. [Google Scholar]
- Kwon, H.D.; Lippman, S.A.; Tang, C.S. Optimal markdown pricing strategy with demand learning. Probab. Eng. Inf. Sci. 2012, 26, 77–104. [Google Scholar] [CrossRef]
- Besbes, O.; Zeevi, A. On the minimax complexity of pricing in a changing environment. Oper. Res. 2011, 59, 66–79. [Google Scholar] [CrossRef] [Green Version]
- Keskin, N.B.; Zeevi, A. Dynamic pricing with an unknown demand model: A symptotically optimal semi-myopic policies. Oper. Res. 2014, 62, 1142–1167. [Google Scholar] [CrossRef] [Green Version]
- Chen, N.; Gallego, G. Nonparametric pricing analytics with customer covariates. Oper. Res. 2021, 69, 974–984. [Google Scholar] [CrossRef]
- Chen, N.; Gallego, G. A Primal—Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint. Available online: https://pubsonline.informs.org/doi/abs/10.1287/moor.2021.1220 (accessed on 10 February 2022).
- Aoki, M. On a dual control approach to the pricing policies of a trading specialist. In Proceedings of the IFIP Technical Conference on Optimization Techniques, Rome, Italy, 7–11 May 1973; Springer: Berlin/Heidelberg, Germany, 1973; pp. 272–282. [Google Scholar]
- Chong, C.Y.; Cheng, D. Multistage pricing under uncertain demand. In Annals of Economic and Social Measurement; NBER: Cambridge, MA, USA, 1975; Volume 4, Number 2; pp. 311–323. [Google Scholar]
- McLennan, A. Price dispersion and incomplete learning in the long run. J. Econ. Dyn. Control 1984, 7, 331–347. [Google Scholar] [CrossRef]
- Rothschild, M. A two-armed bandit theory of market pricing. J. Econ. Theory 1974, 9, 185–202. [Google Scholar] [CrossRef]
- Besbes, O.; Zeevi, A. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 2009, 57, 1407–1420. [Google Scholar] [CrossRef] [Green Version]
- Den Boer, A.V.; Zwart, B. Dynamic pricing and learning with finite inventories. Oper. Res. 2015, 63, 965–978. [Google Scholar] [CrossRef] [Green Version]
- Gatti Pinheiro, G.; Defoin-Platel, M.; Regin, J.C. Optimizing revenue maximization and demand learning in airline revenue management. arXiv 2022, arXiv:2203.11065. [Google Scholar]
- Aviv, Y.; Pazgal, A. Dynamic Pricing of Short Life-Cycle Products through Active Learning; Olin School Business, Washington University: St. Louis, MO, USA, 2005. [Google Scholar]
- Cope, E. Bayesian strategies for dynamic pricing in e-commerce. Nav. Res. Logist. (NRL) 2007, 54, 265–281. [Google Scholar] [CrossRef]
- Xia, C.H.; Dube, P. Dynamic pricing in e-services under demand uncertainty. Prod. Oper. Manag. 2007, 16, 701–712. [Google Scholar] [CrossRef]
- Thompson, W.R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 1933, 25, 285–294. [Google Scholar] [CrossRef]
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
- Brafman, R.I.; Tennenholtz, M. R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 2002, 3, 213–231. [Google Scholar]
- Kearns, M.; Singh, S. Near-optimal reinforcement learning in polynomial time. Mach. Learn. 2002, 49, 209–232. [Google Scholar] [CrossRef] [Green Version]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2778–2787. [Google Scholar]
- Gallego, G.; Van Ryzin, G. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Manag. Sci. 1994, 40, 999–1020. [Google Scholar] [CrossRef] [Green Version]
- Newman, J.P.; Ferguson, M.E.; Garrow, L.A.; Jacobs, T.L. Estimation of choice-based models using sales data from a single firm. Manuf. Serv. Oper. Manag. 2014, 16, 184–197. [Google Scholar] [CrossRef] [Green Version]
- Talluri, K.T.; Van Ryzin, G.; Van Ryzin, G. The Theory and Practice of Revenue Management; Springer: Berlin/Heidelberg, Germany, 2004; Volume 1. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1928–1937. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Wan, Y.; Naik, A.; Sutton, R.S. Learning and planning in average-reward markov decision processes. In Proceedings of the International Conference on Machine Learning, Virtual. 18–24 July 2021; pp. 10653–10662. [Google Scholar]
- Zhang, S.; Wan, Y.; Sutton, R.S.; Whiteson, S. Average-Reward Off-Policy Policy Evaluation with Function Approximation. arXiv 2021, arXiv:2101.02808. [Google Scholar]
- Degris, T.; White, M.; Sutton, R.S. Off-policy actor–critic. arXiv 2012, arXiv:1205.4839. [Google Scholar]
- Schaul, T.; Horgan, D.; Gregor, K.; Silver, D. Universal value function approximators. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1312–1320. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Cassandra, A.R. Planning and acting in partially observable stochastic domains. Artif. Intell. 1998, 101, 99–134. [Google Scholar] [CrossRef] [Green Version]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Belobaba, P.P.; Hopperstad, C. Algorithms for revenue management in unrestricted fare markets. In Proceedings of the Meeting of the INFORMS Section on Revenue Management; Massachusetts Institute of Technology: Cambridge, MA, USA, 2004. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Goldberg, K.; Gonzalez, J.; Jordan, M.; Stoica, I. RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3053–3062. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
Study | Method | Demand Model | Capacity Constraining | Multi-Flight |
---|---|---|---|---|
CEP [27] | heuristic | Bayesian, non-parametric, parametric | ✓ | ✓ |
CVP [2] | heuristic | parametric | ||
(Ningyuan and Gallego) [26] | heuristic | non-parametric | ✓ | |
(Elreedy et al.), see Equation (1) [3] | heuristic | parametric | ||
(Gatti Pinheiro et al.) [33] | heuristic | parametric | ✓ | |
This work | RL | parametric | ✓ | ✓ |
Experiment | Hyperparameter | Value |
---|---|---|
unconstrained capacity | train batch size | 1,032,240 |
learning rate () | ||
entropy coefficient (see [48]) | 0.005 | |
value function clip (see [48]) | 30 | |
eligibility trace (see [59]) | 0.15 | |
constrained capacity | train batch size | 1,771,000 |
learning rate () | ||
entropy coefficient | 0.015 | |
value function clip | 30 | |
eligibility trace | 0.1 |
Experiment | Method | Average Normalized Revenue | Price Sensitivity MSE | Arrival Rate MSE |
---|---|---|---|---|
Unconstrained capacity | RMS (CEP) | 0.702 ± 0.007 | 0.0401 ± 0.0026 | — |
heuristic [33] | 0.783 ± 0.003 | 0.0148 ± 0.0008 | — | |
RL | 0.868 ± 0.002 | 0.0109 ± 0.0005 | — | |
Constrained capacity | RMS | 0.862 ± 0.003 | 0.0072 ± 0.0002 | 0.162 ± 0.003 |
RL | 0.912 ± 0.002 | 0.0090 ± 0.0003 | 0.219 ± 0.004 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gatti Pinheiro, G.; Defoin-Platel, M.; Regin, J.-C. Outsmarting Human Design in Airline Revenue Management. Algorithms 2022, 15, 142. https://doi.org/10.3390/a15050142
Gatti Pinheiro G, Defoin-Platel M, Regin J-C. Outsmarting Human Design in Airline Revenue Management. Algorithms. 2022; 15(5):142. https://doi.org/10.3390/a15050142
Chicago/Turabian StyleGatti Pinheiro, Giovanni, Michael Defoin-Platel, and Jean-Charles Regin. 2022. "Outsmarting Human Design in Airline Revenue Management" Algorithms 15, no. 5: 142. https://doi.org/10.3390/a15050142
APA StyleGatti Pinheiro, G., Defoin-Platel, M., & Regin, J. -C. (2022). Outsmarting Human Design in Airline Revenue Management. Algorithms, 15(5), 142. https://doi.org/10.3390/a15050142