Cascading Multi-Agent Policy Optimization for Demand Forecasting †
Abstract
1. Introduction
- RQ1: Can RL be used for forecasting problems?
- RQ2: What will the performance of an RL-based forecasting system be compared to that of the state-of-the-art gradient boosting models?
- We propose a novel multi-agent DRL architecture to solve the real-world problem of multi-store demand forecasting, where agents interact and share knowledge in a cascading fashion. It is, to the best of our knowledge, the first such study.
- We highlight the potential of DRL methods in forecasting, an area that has not been well studied.
- We conducted extensive experiments to validate the model’s rationale and practicality, as well as to compare its performance against that of the existing methods.
2. The Literature Review
3. The Problem Statement and the Proposed Solution
3.1. The Multi-Agent Environment
- is the state representation for agent i at step t which consists of the item embedding, the state feature vector, and the action information of other agents. Further elaboration on this representation will be provided in the following sections.
- Joint actions are continuous non-negative vectors of dimension N. is the action (predicted sales) provided by the agent i at step t.
- At each step t, agent i observes state and takes action . The corresponding reward value is computed as the difference between its action (predicted sales) and the target (actual sales), which is unique to the agent.
3.2. Cascading Agents
3.3. Parameter Sharing
3.4. The Objective Function
Algorithm 1 Cascading multi-agent PPO |
Input: Initial policy parameters , , k, and |
|
4. Experiments
4.1. Data and Methods
4.2. The Experimental Setup
4.3. Measuring the Forecasting Performance
5. Results and Discussion
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, Q.; Tian, Z. A hybrid load forecasting system based on data augmentation and ensemble learning under limited feature availability. Expert Syst. Appl. 2024, 261, 125567. [Google Scholar] [CrossRef]
- Xu, S.; Chan, H.K.; Zhang, T. Forecasting the demand of the aviation industry using hybrid time series SARIMA-SVR approach. Transp. Res. Part E Logist. Transp. Rev. 2019, 122, 169–180. [Google Scholar] [CrossRef]
- Swaminathan, K.; Venkitasubramony, R. Demand forecasting for fashion products: A systematic review. Int. J. Forecast. 2023, 40, 247–267. [Google Scholar] [CrossRef]
- Fildes, R.; Ma, S.; Kolassa, S. Retail forecasting: Research and practice. Int. J. Forecast. 2022, 38, 1283–1318. [Google Scholar] [CrossRef]
- Lu, C.J.; Lee, T.S.; Chiu, C.C. Financial time series forecasting using independent component analysis and support vector regression. Decis. Support Syst. 2009, 47, 115–125. [Google Scholar] [CrossRef]
- Aamer, A.; Eka Yani, L.; Alan Priyatna, I. Data analytics in the supply chain management: Review of machine learning applications in demand forecasting. Oper. Supply Chain. Manag. Int. J. 2020, 14, 1–13. [Google Scholar] [CrossRef]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 1999, 12, 1057–1063. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Cai, H.; Ren, K.; Zhang, W.; Malialis, K.; Wang, J.; Yu, Y.; Guo, D. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017; pp. 661–670. [Google Scholar]
- Zheng, G.; Zhang, F.; Zheng, Z.; Xiang, Y.; Yuan, N.J.; Xie, X.; Li, Z. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 167–176. [Google Scholar]
- Oroojlooyjadid, A.; Nazari, M.; Snyder, L.V.; Takáč, M. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manuf. Serv. Oper. Manag. 2022, 24, 285–304. [Google Scholar] [CrossRef]
- Levine, S.; Kumar, A.; Tucker, G.; Fu, J. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv 2020, arXiv:2005.01643. [Google Scholar] [CrossRef]
- Taleizadeh, A.A.; Niaki, S.T.A.; Aryanezhad, M.B.; Shafii, N. A hybrid method of fuzzy simulation and genetic algorithm to optimize constrained inventory control systems with stochastic replenishments and fuzzy demand. Inf. Sci. 2013, 220, 425–441. [Google Scholar] [CrossRef]
- Daniel, J.S.R.; Rajendran, C. A simulation-based genetic algorithm for inventory optimization in a serial supply chain. Int. Trans. Oper. Res. 2005, 12, 101–127. [Google Scholar] [CrossRef]
- Syntetos, A.A.; Boylan, J.E. The accuracy of intermittent demand estimates. Int. J. Forecast. 2005, 21, 303–314. [Google Scholar] [CrossRef]
- Ferbar, L.; Čreslovnik, D.; Mojškerc, B.; Rajgelj, M. Demand forecasting methods in a supply chain: Smoothing and denoising. Int. J. Prod. Econ. 2009, 118, 49–54. [Google Scholar] [CrossRef]
- Gonçalves, J.N.; Cortez, P.; Carvalho, M.S.; Frazao, N.M. A multivariate approach for multi-step demand forecasting in assembly industries: Empirical evidence from an automotive supply chain. Decis. Support Syst. 2021, 142, 113452. [Google Scholar] [CrossRef]
- Bolandnazar, E.; Rohani, A.; Taki, M. Energy consumption forecasting in agriculture by artificial intelligence and mathematical models. Energy Sources Part A Recover. Util. Environ. Eff. 2020, 42, 1618–1632. [Google Scholar] [CrossRef]
- Deng, S.; Su, J.; Zhu, Y.; Yu, Y.; Xiao, C. Forecasting carbon price trends based on an interpretable light gradient boosting machine and Bayesian optimization. Expert Syst. Appl. 2024, 242, 122502. [Google Scholar] [CrossRef]
- Huang, Y.; Yuan, Y.; Chen, H.; Wang, J.; Guo, Y.; Ahmad, T. A novel energy demand prediction strategy for residential buildings based on ensemble learning. Energy Procedia 2019, 158, 3411–3416. [Google Scholar] [CrossRef]
- Huber, J.; Stuckenschmidt, H. Daily retail demand forecasting using machine learning with emphasis on calendric special days. Int. J. Forecast. 2020, 36, 1420–1438. [Google Scholar] [CrossRef]
- Gutierrez, R.S.; Solis, A.O.; Mukhopadhyay, S. Lumpy demand forecasting using neural networks. Int. J. Prod. Econ. 2008, 111, 409–420. [Google Scholar] [CrossRef]
- Ravulapati, K.K.; Rao, J.; Das, T.K. A reinforcement learning approach to stochastic business games. Iie Trans. 2004, 36, 373–385. [Google Scholar] [CrossRef]
- Sui, Z.; Gosavi, A.; Lin, L. A reinforcement learning approach for inventory replenishment in vendor-managed inventory systems with consignment inventory. Eng. Manag. J. 2010, 22, 44–53. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Ding, Y.; Feng, M.; Liu, G.; Jiang, W.; Zhang, C.; Zhao, L.; Song, L.; Li, H.; Jin, Y.; Bian, J. Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management. arXiv 2022, arXiv:2212.07684. [Google Scholar] [CrossRef]
- Liu, T.; Tan, Z.; Xu, C.; Chen, H.; Li, Z. Study on deep reinforcement learning techniques for building energy consumption forecasting. Energy Build. 2020, 208, 109675. [Google Scholar] [CrossRef]
- Zhou, X.; Lin, W.; Kumar, R.; Cui, P.; Ma, Z. A data-driven strategy using long short term memory models and reinforcement learning to predict building electricity consumption. Appl. Energy 2022, 306, 118078. [Google Scholar] [CrossRef]
- Bellman, R. A Markovian decision process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Littman, M.L. Markov games as a framework for multi-agent reinforcement learning. In Machine Learning Proceedings 1994; Elsevier: New Brunswick, NJ, USA, 1994; pp. 157–163. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. M5 accuracy competition: Results, findings, and conclusions. Int. J. Forecast. 2022, 38, 1346–1364. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Method | Parameter | Line/Grid Values |
---|---|---|
lstm | hidden layers | |
hidden units | ||
batch size | ||
lgbm | number of estimators | |
learning rate | ||
max depth | ||
min child samples | ||
ours | regularization | , step of |
batch size |
n. | Index | mlr | lstm | lgbm | ours |
---|---|---|---|---|---|
50 | RMSE | ||||
300 | RMSE | ||||
700 | RMSE | ||||
n. | Index | mlr | lstm | lgbm | ours |
---|---|---|---|---|---|
50 | RMSE | ||||
300 | RMSE | ||||
700 | RMSE | ||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Varasteh Yazdi, S. Cascading Multi-Agent Policy Optimization for Demand Forecasting. Comput. Sci. Math. Forum 2025, 11, 18. https://doi.org/10.3390/cmsf2025011018
Varasteh Yazdi S. Cascading Multi-Agent Policy Optimization for Demand Forecasting. Computer Sciences & Mathematics Forum. 2025; 11(1):18. https://doi.org/10.3390/cmsf2025011018
Chicago/Turabian StyleVarasteh Yazdi, Saeed. 2025. "Cascading Multi-Agent Policy Optimization for Demand Forecasting" Computer Sciences & Mathematics Forum 11, no. 1: 18. https://doi.org/10.3390/cmsf2025011018
APA StyleVarasteh Yazdi, S. (2025). Cascading Multi-Agent Policy Optimization for Demand Forecasting. Computer Sciences & Mathematics Forum, 11(1), 18. https://doi.org/10.3390/cmsf2025011018