Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review
Abstract
1. Introduction
2. Human–AI Collaboration in Portfolio Reinforcement Learning
- Advisory Mode. The RL agent generates allocation recommendations or risk diagnostics subject to human approval prior to execution, acting primarily as an analytical assistant.
- Constraint-Guided Mode. Human-defined risk limits, leverage caps, or regulatory rules shape the feasible action space or objective function. While governance boundaries are explicit, constraint enforcement typically remains static and rarely adapts to model confidence.
- Uncertainty-Aware Escalation Mode. Systems communicate confidence estimates that trigger human intervention when predefined thresholds are exceeded, although such mechanisms often remain heuristic and imperfectly calibrated.
- Shared-Control Mode. Strategic decisions—such as risk budgeting and allocation regimes—remain under human supervision, while tactical execution is delegated to RL agents. However, reliability signals are not consistently propagated across decision levels.
3. Methodological Approach
3.1. Review Design and Research Questions
3.2. Search Strategy and Database Coverage
3.3. Inclusion and Exclusion Criteria
3.4. Selection Process and Study Flow
3.5. Data Extraction and Coding Framework
- RL paradigm—single-agent, deep RL, hierarchical RL, multi-agent RL, or modular/hybrid, based on the architectural description in the original paper.
- Risk modeling strategy—reward shaping, soft constraint, hard constraint (CMDP), CVaR/distributional, drawdown-based, or none; multiple codes allowed.
- Risk–uncertainty coupling level—the central variable for RQ2, coded as: Non-coupled (risk and uncertainty treated independently), Partial (uncertainty present but not operationalized in risk constraints), or Explicit (uncertainty directly conditions risk budgeting, constraint thresholds, or allocation intensity).
- Evaluation rigor—four sub-dimensions: transaction cost inclusion, regime-based evaluation, multi-seed reporting, and out-of-sample validation (each coded yes/no).
3.6. Positioning Relative to Existing Reviews
4. Architectures of Reinforcement Learning for Portfolio Optimization
4.1. Single-Agent Reinforcement Learning
4.2. Deep Reinforcement Learning (DRL)
4.3. Hierarchical Reinforcement Learning (HRL)
4.4. Multi-Agent Reinforcement Learning (MARL)
4.5. Modular and Hybrid Architectures
5. Risk-Aware Reinforcement Learning
5.1. Reward Shaping and Risk-Sensitive Objectives
5.2. Hard and Soft Constraints in Portfolio RL
5.3. Downside Risk Measures: CVaR and Drawdown-Aware RL
5.4. Distributional Reinforcement Learning
5.5. Structural Limitations of Current Risk-Aware RL Approaches
5.6. Formal Perspective on Risk-Uncertainty Decoupling
6. Uncertainty Modeling in Financial Reinforcement Learning
6.1. Aleatoric and Epistemic Uncertainty in Finance
6.2. Bayesian Approaches and Probabilistic Policies
6.3. Ensemble Methods and Approximate Epistemic Estimation
6.4. Uncertainty-Driven Exploration and Robustness
6.5. Structural Limitations of Current Uncertainty-Aware RL
7. Multi-Modal Signals and Modular Architectures for Portfolio Reinforcement Learning
7.1. Motivation for Multi-Modal Learning in Finance
7.2. Sentiment and Behavioral Signals
7.3. Volatility Modeling and Risk State Estimation
7.4. Structural Dependencies and Graph-Based Representations
7.5. Static Versus Dynamic Signal Fusion
7.6. Modular Architectures as Structural Unification
7.7. Structural Implications for Portfolio Reinforcement Learning
8. Evaluation Protocols and Deployment Pitfalls
8.1. Backtesting Bias and Data Leakage
8.2. Regime-Based and Stress-Oriented Evaluation
8.3. Transaction Costs, Turnover, and Market Frictions
8.4. Overfitting, Sample Inefficiency, and Selection Bias
8.5. Reproducibility and Benchmark Fragmentation
8.6. Bridging Research and Deployment
8.7. Deployment-Oriented Validation Framework
9. Open Challenges and Research Directions
9.1. Continual and Lifelong Reinforcement Learning Under Risk Constraints
9.2. Generalization Across Markets, Assets, and Temporal Horizons
9.3. Interpretability, Accountability, and Regulatory Integration
9.4. Standardized Benchmarks and Deployment-Oriented Evaluation
9.5. Toward Unified Risk-Uncertainty-Modularity Frameworks
9.6. Conceptual Governance-Aware Unified Architecture
9.7. Structural Failure Modes Across Integration Levels
- Overconfident policies due to insufficient model awareness;
- Static risk control unable to adapt to changing conditions;
- Limited use of model confidence in allocation and constraint decisions;
- Inconsistent signals across system components;
- Weak propagation of uncertainty information across decision layers.
9.8. Limitations of This Review
10. Conclusions
- Risk-aware RL architectures—particularly hierarchical and modular systems—offer measurable advantages over traditional Mean-Variance and Risk Parity approaches in volatile market regimes.
- Epistemic uncertainty estimation should be incorporated as an active control signal rather than a passive diagnostic tool, enabling dynamic adjustment of risk constraints in real time.
- Governance mechanisms—including human oversight triggers and escalation protocols—are essential for institutional deployment and should be explicitly designed into the system architecture.
- Evaluation of RL-based portfolio systems should systematically include multi-seed reporting, regime-based testing, and transaction cost modeling to ensure real-world validity.
- The explicit coupling of risk management and uncertainty quantification remains an open problem—only 9% of reviewed studies address it directly.
- Continual learning and regime-adaptive architectures represent a critical gap, as most systems assume stationary market dynamics.
- Standardized benchmarking protocols for portfolio RL—analogous to those established in robotics and game-playing RL—are urgently needed.
- The governance layer of human–AI collaborative systems in finance remains largely theoretical and requires empirical validation in institutional settings.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ang, A.; Timmermann, A. Regime Changes and Financial Markets. Annu. Rev. Financ. Econ. 2012, 4, 313–337. [Google Scholar] [CrossRef]
- DeMiguel, V.; Garlappi, L.; Uppal, R. Optimal Versus Naive Diversification: How Inefficient Is the 1/N Portfolio Strategy? Rev. Financ. Stud. 2009, 22, 1915–1953. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Hambly, B.; Xu, R.; Yang, H. Recent Advances in Reinforcement Learning in Finance. Math. Financ. 2023, 33, 437–503. [Google Scholar] [CrossRef]
- Lopez de Prado, M. Advances in Financial Machine Learning; Wiley: Hoboken, NJ, USA, 2018. [Google Scholar]
- Fischer, T.G. Reinforcement Learning in Financial Markets—A Survey; FAU Discussion Papers in Economics, No. 12/2018; Friedrich-Alexander University Erlangen-Nürnberg: Erlangen, Germany, 2018; Available online: https://econstor.eu/bitstream/10419/183139/1/1032172355.pdf (accessed on 7 April 2026).
- Cummings, M.M. Man versus Machine or Man + Machine? IEEE Intell. Syst. 2014, 29, 62–69. [Google Scholar] [CrossRef]
- Song, G.; Zhao, T.; Ma, X.; Lin, P.; Cui, C. Reinforcement Learning-Based Portfolio Optimization with Deterministic State Transition. Inf. Sci. 2025, 690, 121538. [Google Scholar] [CrossRef]
- Aboussalah, A.M.; Lee, C.-G. Continuous Control with Stacked Deep Dynamic Recurrent Reinforcement Learning for Portfolio Optimization. Expert Syst. Appl. 2020, 140, 112891. [Google Scholar] [CrossRef]
- Millea, A.; Edalat, A. Using Deep Reinforcement Learning with Hierarchical Risk Parity for Portfolio Optimization. Int. J. Financ. Stud. 2022, 11, 10. [Google Scholar] [CrossRef]
- Jin, B. A Mean-VaR Based Deep Reinforcement Learning Framework for Practical Algorithmic Trading. IEEE Access 2023, 11, 28920–28933. [Google Scholar] [CrossRef]
- Winkel, D.; Strauß, N.; Schubert, M.; Seidl, T. Risk-Aware Reinforcement Learning for Multi-Period Portfolio Selection. In Machine Learning and Knowledge Discovery in Databases; Amini, M.-R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2023; pp. 185–200. [Google Scholar]
- Jiang, Y.; Olmo, J.; Atwi, M. Deep Reinforcement Learning for Portfolio Selection. Glob. Financ. J. 2024, 62, 101016. [Google Scholar] [CrossRef]
- Hao, Z.; Zhang, H.; Zhang, Y. Stock Portfolio Management by Using Fuzzy Ensemble Deep Reinforcement Learning Algorithm. J. Risk Financ. Manag. 2023, 16, 201. [Google Scholar] [CrossRef]
- Li, Z.; Tam, V.; Yeung, K.L. Developing a Multi-Agent and Self-Adaptive Framework with Deep Reinforcement Learning for Dynamic Portfolio Risk Management. arXiv 2024, arXiv:2402.00515. [Google Scholar] [CrossRef]
- Khemlichi, F.; Khamlichi, Y.I.; Ali, S.E.B. Modular Reinforcement Learning for Multi-Market Portfolio Optimization. Information 2025, 16, 961. [Google Scholar] [CrossRef]
- Hêche, F.; Nigro, B.; Barakat, O.; Robert-Nicoud, S. Risk-Averse Policies for Natural Gas Futures Trading Using Distributional Reinforcement Learning. arXiv 2025, arXiv:2501.04421. [Google Scholar] [CrossRef]
- Lina, J.; Banda, S.S.; Paib, H.-T.; Rawal, B.S. EUDRL: Explainable Uncertainty-Based Deep Reinforcement Learning for Portfolio Management. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–7. [Google Scholar]
- Hao, J.L.T.; Wang, L.R.; Liu, C.; Choi, C.; Liu, S.; Fan, X. Exploring Epistemic and Distributional Uncertainties in Algorithmic Trading Agents. In Proceedings of the 2024 IEEE International Conference on Agents (ICA), Wollongong, Australia, 4–6 December 2024; pp. 82–87. [Google Scholar]
- Park, K.; Jung, H.-G.; Eom, T.-S.; Lee, S.-W. Uncertainty-Aware Portfolio Management with Risk-Sensitive Multiagent Network. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 362–375. [Google Scholar] [CrossRef]
- Khemlichi, F.; Khamlichi, Y.I.; Ali, S.E.B. Hierarchical Multi-Agent System with Bayesian Neural Networks for Portfolio Optimization. Math. Model. Eng. Probl. 2025, 12, 1257. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proc. Mach. Learn. Res. 2016, 48, 1050–1059. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. arXiv 2016, arXiv:1612.01474. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
- Benhamou, E. Can Deep Reinforcement Learning Solve the Portfolio Allocation Problem? Ph.D. Thesis, Université Paris Sciences et Lettres, Paris, France, 2023. [Google Scholar]
- Lim, Q.Y.E.; Cao, Q.; Quek, C. Dynamic Portfolio Rebalancing through Reinforcement Learning. Neural Comput. Appl. 2022, 34, 7125–7139. [Google Scholar] [CrossRef]
- Charkhestani, A.; Esfahanipour, A. Behaviorally Informed Deep Reinforcement Learning for Portfolio Optimization with Loss Aversion and Overconfidence. Sci. Rep. 2026, 16, 6443. [Google Scholar] [CrossRef] [PubMed]
- Mani Shankar, M.; Sweety, A.; Deepthi, D. Optimizing Algorithmic Trading Through DRL: A Comparative Analysis of Single-Agent and Multi-Agent Models. In Data Science and Applications; Nanda, S.J., Yadav, R.P., Prasad, M., Saraswat, M., Eds.; Springer Nature: Cham, Switzerland, 2026; pp. 1–15. [Google Scholar]
- Espiga-Fernández, F.; García-Sánchez, Á.; Ordieres-Meré, J. A Systematic Approach to Portfolio Optimization: A Comparative Study of Reinforcement Learning Agents, Market Signals, and Investment Horizons. Algorithms 2024, 17, 570. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Jiang, M.; Xu, Z.; Lin, Z. Dynamic Risk Control and Asset Allocation Using Q-Learning in Financial Markets. Trans. Comput. Sci. Methods 2024, 4. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
- Khemlichi, F.; Elfilali, H.E.; Chougrad, H.; Ben Ali, S.E.; Idrissi Khamlichi, Y. Actor-Critic Methods in Stock Trading: A Comparative Study. In Proceedings of the 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Online Part, 20–21 July 2023; pp. 1–5. [Google Scholar]
- Sun, Q.; Wei, X.; Yang, X. GraphSAGE with Deep Reinforcement Learning for Financial Portfolio Optimization. Expert Syst. Appl. 2024, 238, 122027. [Google Scholar] [CrossRef]
- Li, X.; Li, Y.; Zhan, Y.; Liu, X.-Y. Optimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning for Stock Portfolio Allocation. arXiv 2019, arXiv:1907.01503. [Google Scholar] [CrossRef]
- Khemlichi, F.; Chougrad, H.; Khamlichi, Y.I.; el Boushaki, A.; Ben Ali, S.E. Deep Deterministic Policy Gradient for Portfolio Management. In Proceedings of the 2020 6th IEEE Congress on Information Science and Technology (CiSt), Essaouira, Morocco, 5–12 June 2020; pp. 424–429. [Google Scholar]
- Khemlichi, F.; Chougrad, H.; Ali, S.E.B.; Khamlichi, Y.I. Portfolio Optimization System (POS): A Deep Reinforcement Learning Approach for Market-Adaptive Investment Strategies. In Proceedings of the 2025 5th International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), Zanzibar, Tanzania, 16–19 October 2025; pp. 1–6. [Google Scholar]
- Khemlichi, F.; Chougrad, H.; Khamlichi, Y.I.; Elboushaki, A.; Ali, S.E.B. Deep Deterministic Policy Gradient Based Portfolio Management System. Int. J. Inf. Sci. Technol. 2022, 6, 29–39. [Google Scholar] [CrossRef]
- Bai, Z.-L.; Zhao, Y.-N.; Zhou, Z.-G.; Li, W.-Q.; Gao, Y.-Y.; Tang, Y.; Dai, L.-Z.; Dong, Y.-Y. Mercury: A Deep Reinforcement Learning-Based Investment Portfolio Strategy for Risk-Return Balance. IEEE Access 2023, 11, 78353–78362. [Google Scholar] [CrossRef]
- Khemlichi, F.; Chougrad, H.; Idrissi Khamlichi, Y.; El Boushaki, A.; El Haj Ben Ali, S. A Stock Trading Strategy Based on Deep Reinforcement Learning. In Advanced Intelligent Systems for Sustainable Development (AI2SD’2020); Kacprzyk, J., Balas, V.E., Ezziyyani, M., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 920–928. [Google Scholar]
- Yang, H.; Liu, X.-Y.; Zhong, S.; Walid, A. Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. In Proceedings of the 1st ACM International Conference on AI in Finance (ICAIF’20), New York, NY, USA, 3–4 November 2020; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Millea, A. Hierarchical Model-Based Deep Reinforcement Learning for Single-Asset Trading. Analytics 2023, 2, 560–576. [Google Scholar] [CrossRef]
- Sun, R.; Xi, Y.; Stefanidis, A.; Jiang, Z.; Su, J. A Novel Multi-Agent Dynamic Portfolio Optimization Learning System Based on Hierarchical Deep Reinforcement Learning. Complex Intell. Syst. 2025, 11, 311. [Google Scholar] [CrossRef]
- Cheridito, P.; Dupret, J.-L.; Wu, Z. ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book. arXiv 2025, arXiv:2511.02016. [Google Scholar] [CrossRef]
- Kumlungmak, K.; Vateekul, P. Multi-Agent Deep Reinforcement Learning With Progressive Negative Reward for Cryptocurrency Trading. IEEE Access 2023, 11, 66440–66455. [Google Scholar] [CrossRef]
- Cheng, C.; Chen, B.; Xiao, Z.; Lee, R.S.T. Quantum Finance and Fuzzy Reinforcement Learning-Based Multi-Agent Trading System. Int. J. Fuzzy Syst. 2024, 26, 2224–2245. [Google Scholar] [CrossRef]
- Ying, R.; Lyu, J.; Li, J. Dynamic Portfolio Optimization with Data-Aware Multi-Agent Reinforcement Learning and Adaptive Risk Control. In Proceedings of the 4th International Conference on Artificial Intelligence and Intelligent Information Processing (AIIP 2024), Wuhan, China, 22–24 November 2024; ACM: New York, NY, USA, 2025; pp. 912–918. [Google Scholar] [CrossRef]
- Shavandi, A.; Khedmati, M. A Multi-Agent Deep Reinforcement Learning Framework for Algorithmic Trading in Financial Markets. Expert Syst. Appl. 2022, 208, 118124. [Google Scholar] [CrossRef]
- Cheng, L.-C.; Sun, J.-S. Multiagent-Based Deep Reinforcement Learning Framework for Multi-Asset Adaptive Trading and Portfolio Management. Neurocomputing 2024, 594, 127800. [Google Scholar] [CrossRef]
- Ma, C.; Zhang, J.; Li, Z.; Xu, S. Multi-Agent Deep Reinforcement Learning Algorithm with Trend Consistency Regularization for Portfolio Management. Neural Comput. Appl. 2022, 35, 6589–6601. [Google Scholar] [CrossRef]
- Kim, S.-H.; Lee, K.-H. Multi-Asset Multi-Agent Reinforcement Learning for Portfolio Management. IEEE Access 2025, 13, 194456–194474. [Google Scholar] [CrossRef]
- Li, H.; Hai, M. Deep Reinforcement Learning Model for Stock Portfolio Management Based on Data Fusion. Neural Process. Lett. 2024, 56, 108. [Google Scholar] [CrossRef]
- Zhang, H.; Shi, Z.; Hu, Y.; Ding, W.; Kuruoglu, E.E.; Zhang, X.-P. Optimizing Trading Strategies in Quantitative Markets Using Multi-Agent Reinforcement Learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024), Seoul, Republic of Korea, 14–19 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 136–140. [Google Scholar] [CrossRef]
- Xu, Z.; Bao, Q.; Wang, Y.; Feng, H.; Du, J.; Sha, Q. Reinforcement Learning in Finance: QTRAN for Portfolio Optimization. J. Comput. Technol. Softw. 2025, 4. [Google Scholar] [CrossRef]
- Ram, K.S.R.; M, S.; I, N.; M, T.; M, K.; B, N. Enhanced Investment Decision Making with a Reinforcement Learning-Based Multi-Agent Portfolio Management System. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; pp. 1–6. [Google Scholar]
- Chen, M.-Y.; Chen, C.-T.; Huang, S.-H. Knowledge Distillation for Portfolio Management Using Multi-Agent Reinforcement Learning. Adv. Eng. Inf. 2023, 57, 102096. [Google Scholar] [CrossRef]
- Khemlichi, F.; Khamlichi, Y.I.; Ali, S.E.B. MPLS: A Modular Portfolio Learning System for Adaptive Portfolio Optimization. Math. Model. Eng. Probl. 2025, 12, 1959–1970. [Google Scholar] [CrossRef]
- Carta, S.; Corriga, A.; Ferreira, A.; Podda, A.S.; Recupero, D.R. A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning. Appl. Intell. 2021, 51, 889–905. [Google Scholar] [CrossRef]
- Yu, X.; Wu, W.; Liao, X.; Han, Y. Dynamic Stock-Decision Ensemble Strategy Based on Deep Reinforcement Learning. Appl. Intell. 2023, 53, 2452–2470. [Google Scholar] [CrossRef]
- Yang, M.; Hu, Y.; Wang, J. Risk-Averse Trader: A Deep Reinforcement Learning-Based Portfolio Optimization Method for Risk-Averse Investors. In Proceedings of the 2024 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI), Kusatsu, Japan, 6–8 December 2024; pp. 160–165. [Google Scholar]
- Shen, S.; Ma, C.; Li, C.; Liu, W.; Fu, Y.; Mei, S.; Liu, X.; Wang, C. RiskQ: Risk-Sensitive Multi-Agent Reinforcement Learning Value Factorization. Adv. Neural Inf. Process. Syst. 2023, 36, 34791–34825. [Google Scholar]
- Garrido-Merchán, E.; Mora-Figueroa, S.; Coronado-Vaca, M. Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management. Intell. Syst. Account. Financ. Manag. 2025, 32, e70008. [Google Scholar] [CrossRef]
- Sun, S.; Xue, W.; Wang, R.; He, X.; Zhu, J.; Li, J.; An, B. DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1858–1867. [Google Scholar]
- Wang, X.; Liu, L. Risk-Sensitive Deep Reinforcement Learning for Portfolio Optimization. J. Risk Financ. Manag. 2025, 18, 347. [Google Scholar] [CrossRef]
- Sattar, A.; Sarwar, A.; Gillani, S.; Bukhari, M.; Rho, S.; Faseeh, M. A Novel RMS-Driven Deep Reinforcement Learning for Optimized Portfolio Management in Stock Trading. IEEE Access 2025, 13, 42813–42835. [Google Scholar] [CrossRef]
- Yang, M.; Wang, J.; Hu, Y. RiskawareTrader: A Reinforcement Learning Based Portfolio Optimization for Risk Averter. Int. J. Comput. Intell. Syst. 2025, 19, 25. [Google Scholar] [CrossRef]
- Dong, S.C.; Finlay, J.R. Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes. arXiv 2025, arXiv:2504.09396. [Google Scholar] [CrossRef]
- Bellemare, M.G.; Dabney, W.; Munos, R. A Distributional Perspective on Reinforcement Learning. Proc. Mach. Learn. Res. 2017, 70, 449–458. [Google Scholar]
- Shah, M.I.A.; Barrett, E.; Mason, K. Uncertainty-Aware Knowledge Transformers for Peer-to-Peer Energy Trading with Multi-Agent Reinforcement Learning. arXiv 2025, arXiv:2507.16796. [Google Scholar] [CrossRef]
- Carta, S.; Ferreira, A.; Podda, A.S.; Reforgiato Recupero, D.; Sanna, A. Multi-DQN: An Ensemble of Deep Q-Learning Agents for Stock Market Forecasting. Expert Syst. Appl. 2021, 164, 113820. [Google Scholar] [CrossRef]
- Ansari, Y.; Gillani, S.; Bukhari, M.; Lee, B.; Maqsood, M.; Rho, S. A Multifaceted Approach to Stock Market Trading Using Reinforcement Learning. IEEE Access 2024, 12, 90041–90060. [Google Scholar] [CrossRef]
- Ye, Y.; Pei, H.; Wang, B.; Chen, P.-Y.; Zhu, Y.; Xiao, J.; Li, B. Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 1112–1119. [Google Scholar]
- Huang, Z.; Tanaka, F. MSPM: A Modularized and Scalable Multi-Agent Reinforcement Learning-Based System for Financial Portfolio Management. PLoS ONE 2022, 17, e0263689. [Google Scholar] [CrossRef]
- Liu, X.-Y.; Yang, H.; Chen, Q.; Zhang, R.; Yang, L.; Xiao, B.; Wang, C.D. FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance. arXiv 2020, arXiv:2011.09607. [Google Scholar] [CrossRef]
- Maeda, I.; deGraw, D.; Kitano, M.; Matsushima, H.; Sakaji, H.; Izumi, K.; Kato, A. Deep Reinforcement Learning in Agent Based Financial Market Simulation. J. Risk Financ. Manag. 2020, 13, 71. [Google Scholar] [CrossRef]
- Bailey, D.H.; Lopez de Prado, M. The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality. J. Portf. Manag. 2014, 40, 94–107. [Google Scholar] [CrossRef]
- Ndikum, P.; Ndikum, S. Advancing Investment Frontiers: Industry-Grade Deep Reinforcement Learning for Portfolio Optimization. arXiv 2024, arXiv:2403.07916. [Google Scholar] [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Rockafellar, R.T.; Uryasev, S. Optimization of Conditional Value-at-Risk. J. Risk 2000, 2, 21–41. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]



| Ref | Author(s), Year | RL Architecture | Risk Treatment | Uncertainty Modeled | Risk–Uncertainty Coupling | Key Performance Metrics & Regime Testing |
|---|---|---|---|---|---|---|
| GROUP A—Non-Coupled: Risk and Uncertainty Structurally Decoupled | ||||||
| [8] | Song et al., 2025 (Inf. Sci.) | Deep RL (deterministic state transition) | Reward shaping (return-based) | None—deterministic transitions explicitly assumed | Non-Coupled | Sharpe, cum. return. Backtesting on a single continuous evaluation setting. |
| [9] | Aboussalah & Lee, 2020 (Expert Syst. Appl.) | Deep RL—Recurrent (Stacked DDRL) | Reward shaping (return + turnover penalty) | None—no explicit uncertainty modeling | Non-Coupled | Sharpe, Sortino, max drawdown (OOS). Multiple market periods tested. |
| [10] | Millea & Edalat, 2022 (Int. J. Financ. Stud.) | Hierarchical RL (DRL + HRP) | Hierarchical Risk Parity (structural risk budgeting) | None—deterministic HRP/HERC allocation | Non-Coupled | Sharpe, Calmar, max drawdown (OOS). Multiple market periods. |
| [11] | Jin, 2023 (IEEE Access) | Deep RL (Mean-VaR framework) | Mean-VaR explicit tail-risk objective | None—VaR used as risk measure only | Non-Coupled | Sharpe, VaR, annualized return. Some stress periods included. |
| [12] | Winkel et al., 2023 (ECML PKDD) | Risk-aware RL | Variance-based Pareto risk optimization | None—uncertainty not modeled | Non-Coupled | std. deviation, Sharpe, risk-return Pareto front (Nasdaq-100 OOS). Risk sensitivity analysis. |
| [13] | Jiang, Olmo & Atwi, 2024 (Glob. Finance J.) | Deep RL (TD3 + CNN—RTC) | Risk + transaction cost sensitive reward | None— variance-based risk modeling | Non-Coupled | Sharpe, Calmar, max drawdown (OOS) vs. MV & Sharpe benchmarks. |
| GROUP B—Partial Coupling: Uncertainty Present but Not Operationalized in Risk Constraints | ||||||
| [14] | Hao et al., 2023 (J. Risk Financ. Manag.) | Modular/Hybrid (Fuzzy Ensemble DRL) | Reward shaping + fuzzy market-regime encoding | Implicit via ensemble disagreement | Partial | Sharpe, cum. return, max drawdown. S&P 100 market regimes tested. |
| [15] | Li, Tam & Yeung, 2024 (arXiv) | MARL (self-adaptive risk management) | Dynamic risk management via agent specialization | Partial—agent disagreement as uncertainty proxy | Partial | Sharpe, max drawdown (OOS). Multiple market regimes tested. |
| [16] | Khemlichi et al., 2025 (Information, MDPI) | Modular HRL-MARL (MPLS—multi-algorithm) | CVaR in reward + modular risk scaling | Partial—Bayesian VFM for volatility uncertainty | Partial | Sharpe +40–70% vs. MVP/RP; CVaR limited during COVID. S&P 500, DAX 30, FTSE100. 4 regimes: stable, crisis, recovery, sideways. |
| [17] | Hêche et al., 2025 (arXiv) | Distributional RL (natural gas futures) | CVaR-based distributional RL objectives | Distributional aleatoric uncertainty only | Partial | CVaR, return metrics. Commodity stress testing. |
| GROUP C—Explicit Coupling: Uncertainty Directly Conditions Risk Decisions | ||||||
| [18] | Lina et al., 2024 (ICCCNT—IEEE) | Deep RL (EUDRL—explainable uncertainty-based) | Uncertainty-conditioned reward via local agents | Explicit—local agents provide uncertainty assessments integrated via SHAP explainability | Explicit | Portfolio performance + uncertainty metrics. Portfolio management scenarios. |
| [19] | Hao et al., 2024 (IEEE ICA) | Deep RL (epistemic + distributional uncertainty) | Uncertainty-informed risk decisions | Explicit—epistemic (MC Dropout) and aleatoric uncertainty modeled jointly | Explicit | Uncertainty metrics, trading performance. S&P 500 with recession periods. |
| [20] | Park et al., 2024 (IEEE TNNLS) | MARL (RSMAN—risk-sensitive multiagent) | Risk-sensitive decisions via uncertainty estimation (RSA + RAPG) | Explicit—market + parameter uncertainty directly condition risk-adaptive portfolio generation | Explicit | Sharpe, cum. return, risk-adjusted metrics. Multiple market periods tested. |
| [21] | Khemlichi et al., 2025 (Math. Model. Eng. Probl.) | Hierarchical MARL (BNN-based—uncertainty-aware) | Uncertainty-aware dynamic decisions (BNN + PPO) | Explicit—BNN uncertainty propagated to allocation decisions across agents | Explicit | Sharpe, return, uncertainty calibration metrics. Multi-sector portfolio. |
| Stage | Inclusion Criteria | Exclusion Criteria |
|---|---|---|
| Title/abstract screening | Addresses RL or DRL applied to portfolio management, trading, or asset allocation; published 2016–2025; English language; peer-reviewed or rigorous preprint | Focuses solely on prediction without sequential decision-making; non-English; duplicate; outside 2016–2025 |
| Full-text assessment | Addresses at least one of: RL architecture for portfolio optimization, risk modeling strategy (reward shaping, constraints, CVaR), uncertainty estimation (Bayesian, ensemble, MC Dropout); sufficient methodological transparency to extract data on ≥3 of the 6 coding dimensions | Prediction-only without RL decision layer; insufficient methodological detail; non-indexed or low-quality venue |
| Final quality check | Published in indexed journal or major peer-reviewed conference; clear empirical or methodological contribution; dataset and setup sufficiently described | Predatory venue; opinion-based without methodological substance; direct self-replication without new contribution |
| Architecture Type | Risk Modeling Strategy | Uncertainty Integration | Human–AI Interaction Mode | Deployment Robustness Level | Structural Limitations |
|---|---|---|---|---|---|
| Single-Agent RL | Reward shaping (volatility, turnover penalties) | Rare or implicit; typically, deterministic | Advisory mode | Low | Conflation of signal processing and risk control; no explicit uncertainty conditioning |
| Deep RL (Centralized DRL) | Reward penalties or soft constraints | Occasional ensemble/dropout; rarely calibrated | Advisory/Constraint-guided | Low–Moderate | End-to-end opacity; weak interpretability; risk not conditioned on epistemic confidence |
| Hierarchical RL (HRL) | Risk budgeting at strategic layer; tactical reward shaping | Local uncertainty estimates; limited cross-layer propagation | Shared-control | Moderate | Poor uncertainty aggregation across layers; coordination complexity |
| Multi-Agent RL (MARL) | Agent-level constraints; partial global alignment | Implicit via agent disagreement; no formal aggregation | Shared-control/Constraint-guided | Moderate | Coordination instability; fragmented risk perception; endogenous non-stationarity |
| Modular & Hybrid Architectures | Dedicated risk modules; dynamic risk scaling | Module-specific uncertainty; partial fusion | Uncertainty-aware escalation/Shared-control | Moderate–High (theoretical) | Integration gap between uncertainty signals and capital allocation constraints |
| Unified Risk–Uncertainty–Modular Frameworks (Emerging) | Explicit constraint conditioning on epistemic confidence | Bayesian or calibrated probabilistic propagation across modules | Structured human-in-the-loop | High (conceptual) | Limited empirical validation; computational complexity; benchmark scarcity |
| Architecture Type | Typical Objective | Risk Treatment | Uncertainty Modeling | Signal Integration | Evaluation Rigor | Deployment Readiness | Structural Limitation |
|---|---|---|---|---|---|---|---|
| Single-Agent RL | Return/Sharpe maximization | Reward shaping (volatility penalties) | None or implicit | Monolithic (price-based) | Simple backtests | Prototype | Fragile under regime shifts; opaque decision logic |
| Deep RL | Return/risk-adjusted return | Reward shaping; soft constraints | Rare; mostly deterministic | End-to-end representation | Limited regime testing; few multi-seed reports | Low | Reward sensitivity; overfitting; weak uncertainty calibration |
| Hierarchical RL (HRL) | Strategic + tactical objectives | Sometimes constraint-based | Rarely propagated across layers | Structured but not modular | Partial regime analysis | Conceptual | Credit assignment complexity; weak uncertainty flow |
| Multi-Agent RL (MARL) | Global portfolio objective | Implicit risk sharing | Rare; mostly absent | Distributed asset-level policies | Heterogeneous; rarely standardized | Experimental | Coordination instability; non-stationarity between agents |
| Risk-Constrained RL (CMDP-based) | Return under explicit constraints (CVaR, drawdown) | Hard or soft constraints | Often absent | Monolithic | Improved robustness testing | Medium (cost-aware sometimes) | Risk handled separately from uncertainty |
| Distributional RL | Return distribution optimization | Quantile/CVaR-based | Distributional outcome modeling | Monolithic | Limited stress testing | Prototype | Distribution learning ≠ calibrated epistemic uncertainty |
| Bayesian/Ensemble RL | Confidence-aware optimization | Usually via reward penalties | Epistemic (approximate) | Monolithic | Rare calibration metrics | Low | Uncertainty rarely integrated into action constraints |
| Modular/Multi-Modal RL | Adaptive multi-signal optimization | Varies | Localized per module (sometimes) | Explicit modular fusion | Very heterogeneous | Exploratory | Lack of unified risk-uncertainty integration |
| Unified Risk–Uncertainty–Modular (Emerging) | Joint performance + risk + confidence | Integrated constraints | Propagated uncertainty | Dynamic fusion | Rare end-to-end evaluation | Not yet mature | Still largely conceptual; missing benchmark standardization |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Khemlichi, F.; Idrissi Khamlichi, Y.; Elhaj Ben Ali, S. Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review. Information 2026, 17, 476. https://doi.org/10.3390/info17050476
Khemlichi F, Idrissi Khamlichi Y, Elhaj Ben Ali S. Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review. Information. 2026; 17(5):476. https://doi.org/10.3390/info17050476
Chicago/Turabian StyleKhemlichi, Firdaous, Youness Idrissi Khamlichi, and Safae Elhaj Ben Ali. 2026. "Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review" Information 17, no. 5: 476. https://doi.org/10.3390/info17050476
APA StyleKhemlichi, F., Idrissi Khamlichi, Y., & Elhaj Ben Ali, S. (2026). Human–AI Collaboration in Risk- and Uncertainty-Aware Portfolio Reinforcement Learning: A Critical Review. Information, 17(5), 476. https://doi.org/10.3390/info17050476

