Interpretable Reinforcement Learning for Sequential Strategy Prediction in Language-Based Games
Abstract
1. Introduction
- We propose a unified framework that combines NLP, RL, and I-AI to predict user attempt distributions in Wordle, achieving both high performance and transparency.
- A feature-driven prediction model is constructed using key linguistic indicators (e.g., letter frequency, word frequency, repetition pattern).
- A DDPG-based RL algorithm is adopted to dynamically optimize the prediction policy through environmental interaction.
- SHAP is introduced to explain the model’s predictions, uncovering how different features influence attempt classifications under varying conditions.
2. Related Works
3. Methods
3.1. Definition
3.2. Model Structure
Algorithm 1 The algorithm of the proposed Enhanced-DDPG. |
3.3. Decision Logic and Reward Function
4. Experiments
4.1. Data Preparation and Preprocessing
4.2. The Model Training
5. Results
5.1. Robustness Analysis
5.2. Interpretability Analysis
5.3. Model Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Bentejac, C.; Csorgő, A.; Martinez-Munoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2017. [Google Scholar]
- Li, S.; Dong, X.; Ma, D.; Dang, B.; Zang, H.; Gong, Y. Utilizing the lightgbm algorithm for operator user credit assessment research. arXiv 2024, arXiv:2403.14483. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 22–24 June 2014; pp. 387–395. [Google Scholar]
- Balduzzi, D.; Ghifary, M. Compatible value gradients for reinforcement learning of continuous deep policies. arXiv 2015, arXiv:1509.03005. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2019, arXiv:1812.05905. [Google Scholar] [CrossRef]
- Park, S.; Rybkin, O.; Levine, S. METRA: Scalable Unsupervised RL with Metric-Aware Abstraction. arXiv 2024, arXiv:2310.08887. [Google Scholar] [CrossRef]
- Laidlaw, C.; Zhu, B.; Russell, S.; Dragan, A. The Effective Horizon Explains Deep RL Performance in Stochastic Environments. arXiv 2024, arXiv:2312.08369. [Google Scholar] [CrossRef]
- Reddi, A.; Tölle, M.; Peters, J.; Chalvatzaki, G.; D’Eramo, C. Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula. arXiv 2023, arXiv:2311.01642. [Google Scholar] [CrossRef]
- Futuhi, E.; Karimi, S.; Gao, C.; Müller, M. ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control. arXiv 2025, arXiv:2410.05225. [Google Scholar] [CrossRef]
- Hickling, T.; Zenati, A.; Aouf, N.; Spencer, P. Explainability in deep reinforcement learning: A review into current methods and applications. ACM Comput. Surv. 2023, 56, 1–35. [Google Scholar] [CrossRef]
- Anderson, A.; Dodge, J.; Sadarangani, A.; Juozapaitis, Z.; Newman, E.; Irvine, J.; Chattopadhyay, S.; Fern, A.; Burnett, M. Explaining Reinforcement Learning to Mere Mortals: An Empirical Study. arXiv 2019, arXiv:1903.09708. [Google Scholar] [CrossRef]
- Wang, C.; Wu, L.; Yan, C.; Wang, Z.; Long, H.; Yu, C. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork. Chin. J. Aeronaut. 2020, 33, 2930–2945. [Google Scholar] [CrossRef]
- Douglas, N.; Yim, D.; Kartal, B.; Hernandez-Leal, P.; Maurer, F.; Taylor, M.E. Towers of saliency: A reinforcement learning visualization using immersive environments. In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces, Daejeon, Republic of Korea, 10–13 November 2019; pp. 339–342. [Google Scholar]
- Huber, T.; Schiller, D.; Andre, E. Enhancing explainability of deep reinforcement learning through selective layer-wise relevance propagation. In KI 2019: Advances in Artificial Intelligence: 42nd German Conference on AI, Kassel, Germany, September 23–26, 2019; Proceedings 42; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 188–202. [Google Scholar]
- Muschalik, M.; Baniecki, H.; Fumagalli, F.; Kolpaczki, P.; Hammer, B.; Hullermeier, E. shapiq: Shapley interactions for machine learning. In Advances in Neural Information Processing Systems 37; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2024; pp. 130324–130357. [Google Scholar]
- Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
- Baniecki, H.; Kretowicz, W.; Piątyszek, P.; Wiśniewski, J.; Biecek, P. Dalex: Responsible machine learning with interactive explainability and fairness in python. J. Mach. Learn. Res. 2021, 22, 1–7. [Google Scholar]
- Oksuz, A.C.; Halimi, A.; Ayday, E. AUTOLYCUS: Exploiting Explainable Artificial Intelligence (XAI) for Model Extraction Attacks against Interpretable Models. In Proceedings of the Privacy Enhancing Technologies, Bristol, UK, 15–20 July 2024. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Model-Agnostic Interpretability of Machine Learning. arXiv 2016, arXiv:1606.05386. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Bala, R.; Sharma, A.; Goel, N. Comparative analysis of diabetic retinopathy classification approaches using machine learning and deep learning techniques. Arch. Comput. Methods Eng. 2024, 31, 919–955. [Google Scholar] [CrossRef]
- Letoffe, O.; Huang, X.; Marques-Silva, J. On correcting SHAP scores. arXiv 2024, arXiv:2405.00076v1. [Google Scholar]
- Choi, J.; Kim, H.; Noh, T.; Kang, Y.J.; Noh, Y. Digital Twin Data-Driven Multi-Disciplinary and Multi-Objective Optimization Framework for Automatic Design of Negative Stiffness Honeycomb. Int. J. Precis. Eng. Manuf. 2023, 24, 1453–1472. [Google Scholar] [CrossRef]
- Feretzakis, G.; Sakagianni, A.; Anastasiou, A.; Kapogianni, I.; Bazakidou, E.; Koufopoulos, P.; Koumpouros, Y.; Koufopoulou, C.; Kaldis, V.; Verykios, V.S. Integrating Shapley values into machine learning techniques for enhanced predictions of hospital admissions. Appl. Sci. 2024, 14, 5925. [Google Scholar] [CrossRef]
- Salih, A.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, E.; Menegaz, G.; Lekadir, K. Commentary on explainable artificial intelligence methods: SHAP and LIME. arXiv 2023, arXiv:2305.02012. [Google Scholar]
- Yang, G.; Ye, Q.; Xia, J. Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond. Inf. Fusion 2022, 77, 29–52. [Google Scholar] [CrossRef]
- Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; Muller, K.R. How to explain individual classification decisions. J. Mach. Learn. Res. 2010, 11, 1803–1831. [Google Scholar]
- Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems 31; Neural Information Processing Systems Foundation, Inc. (NeurIPS): San Diego, CA, USA, 2018. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Teso, S.; Alkan, O.; Daly, E.; Stammer, W. Explanations in Interactive Machine Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022. [Google Scholar]
Hyperparameters | Value | |
---|---|---|
RFR | max_depth | 10 |
min_samples_leaf | 4 | |
min_samples_split | 10 | |
n_estimators | 100 | |
XGBoost | max_depth | 5 |
learning_rate | 0.01 | |
n_estimators | 100 | |
subsample | 0.5 | |
colsample_bytree | 0.5 | |
LightGBM | max_depth | 3 |
learning_rate | 0.01 | |
n_estimators | 100 | |
num_leaves | 20 | |
METRA | learning_rate | 3 × 10−4 |
discount_factor | 0.99 | |
latent_dimension | 32 | |
random_exploration-steps | - | |
SQIRL | learning_rate | 10−3 |
discount_factor | 0.95 | |
latent_dimension | - | |
random_exploration-steps | 10,000 |
Model | MSE | R2 |
---|---|---|
Enhanced-DDPG (Raw data) | 0.0134 | 0.8439 |
RFR (Raw data) | 0.0425 | 0.6554 |
Enhanced-DDPG (Gaussian noise data) | 0.0241 | 0.7853 |
RFR (Gaussian noise data) | 0.0512 | 0.6133 |
XGBoost (Raw data) | 0.0276 | 0.7960 |
LightGBM (Raw data) | 0.0221 | 0.8342 |
XGBoost (Gaussian noise data) | 0.0314 | 0.7215 |
LightGBM (Gaussian noise data) | 0.0325 | 0.7749 |
METRA (Raw data) | 0.0242 | 0.6977 |
SQIRL (Raw data) | 0.0371 | 0.7153 |
METRA (Gaussian noise data) | 0.0211 | 0.6991 |
SQIRL (Gaussian noise data) | 0.0281 | 0.7186 |
Model | MSE | R2 |
---|---|---|
DDPG (Our Enhanced-DDPG) | 0.0134 | 0.8439 |
DDPG (Without rep features) | 0.0517 | 0.5230 |
DDPG (Original algorithm) | 0.0298 | 0.6821 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Ji, J.; Yasrab, R.; Wang, S.; Yu, L.; Zhao, L. Interpretable Reinforcement Learning for Sequential Strategy Prediction in Language-Based Games. Algorithms 2025, 18, 427. https://doi.org/10.3390/a18070427
Zhao J, Ji J, Yasrab R, Wang S, Yu L, Zhao L. Interpretable Reinforcement Learning for Sequential Strategy Prediction in Language-Based Games. Algorithms. 2025; 18(7):427. https://doi.org/10.3390/a18070427
Chicago/Turabian StyleZhao, Jun, Jintian Ji, Robail Yasrab, Shuxin Wang, Liang Yu, and Lingzhen Zhao. 2025. "Interpretable Reinforcement Learning for Sequential Strategy Prediction in Language-Based Games" Algorithms 18, no. 7: 427. https://doi.org/10.3390/a18070427
APA StyleZhao, J., Ji, J., Yasrab, R., Wang, S., Yu, L., & Zhao, L. (2025). Interpretable Reinforcement Learning for Sequential Strategy Prediction in Language-Based Games. Algorithms, 18(7), 427. https://doi.org/10.3390/a18070427