Next Article in Journal
On Invertibility of Large Binary Matrices
Previous Article in Journal
Nonparametric Functional Least Absolute Relative Error Regression: Application to Econophysics
Previous Article in Special Issue
Causal Discovery and Classification Using Lempel–Ziv Complexity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Integrating Contextual Causal Deep Networks and LLM-Guided Policies for Sequential Decision-Making

1
Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USA
2
EGADE Business School, Tecnológico de Monterrey, Ave. Rufino Tamayo, Monterrey 66269, Mexico
Mathematics 2026, 14(2), 269; https://doi.org/10.3390/math14020269
Submission received: 12 December 2025 / Revised: 27 December 2025 / Accepted: 9 January 2026 / Published: 10 January 2026
(This article belongs to the Special Issue Computational Methods and Machine Learning for Causal Inference)

Abstract

Sequential decision-making is critical for applications ranging from personalized recommendations to resource allocation. This study evaluates three decision policies—Greedy, Thompson Sampling (via Monte Carlo Dropout), and a zero-shot Large Language Model (LLM)-guided policy (Gemini-1.5-Pro)—within a contextual bandit framework. To address covariate shift and assess subpopulation performance, we utilize a Collective Conditional Diffusion Network (CCDN) where covariates are partitioned into B=10 homogeneous blocks. Evaluating these policies across a high-dimensional treatment space (K=5, resulting in 25=32 actions), we tested performance in a simulated environment and three benchmark datasets: Boston Housing, Wine Quality, and Adult Income. Our results demonstrate that the Greedy strategy achieves the highest Model-Relative Optimal (MRO) coverage, reaching 1.00 in the Wine Quality and Adult Income datasets, though performance drops significantly to 0.05 in the Boston Housing environment. Thompson Sampling maintains competitive regret and, in the Boston Housing dataset, marginally outperforms Greedy in action selection precision. Conversely, the zero-shot LLM-guided policy consistently underperforms in numerical tabular settings, exhibiting the highest median regret and near-zero MRO coverage across most tasks. Furthermore, Wilcoxon tests reveal that differences in empirical outcomes between policies are often not statistically significant (ns), suggesting an optimization ceiling in zero-shot tabular settings. These findings indicate that while traditional model-driven policies are robust, LLM-guided approaches currently lack the numerical precision required for high-dimensional sequential decision-making without further calibration or hybrid integration.
Keywords: contextual bandits; sequential decision-making; Thompson Sampling; greedy policy; LLM-guided policy; experimental design; blocking; K-means; clustering; policy evaluation contextual bandits; sequential decision-making; Thompson Sampling; greedy policy; LLM-guided policy; experimental design; blocking; K-means; clustering; policy evaluation

Share and Cite

MDPI and ACS Style

Kim, J.-M. Integrating Contextual Causal Deep Networks and LLM-Guided Policies for Sequential Decision-Making. Mathematics 2026, 14, 269. https://doi.org/10.3390/math14020269

AMA Style

Kim J-M. Integrating Contextual Causal Deep Networks and LLM-Guided Policies for Sequential Decision-Making. Mathematics. 2026; 14(2):269. https://doi.org/10.3390/math14020269

Chicago/Turabian Style

Kim, Jong-Min. 2026. "Integrating Contextual Causal Deep Networks and LLM-Guided Policies for Sequential Decision-Making" Mathematics 14, no. 2: 269. https://doi.org/10.3390/math14020269

APA Style

Kim, J.-M. (2026). Integrating Contextual Causal Deep Networks and LLM-Guided Policies for Sequential Decision-Making. Mathematics, 14(2), 269. https://doi.org/10.3390/math14020269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop