Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction
Abstract
1. Introduction
- We propose a DRL framework for stock indices that integrates a time series prediction network with the CE to forecast future financial price data, significantly enhancing prediction accuracy. Meanwhile, during the training phase, CE learns and distills generalized prototype embeddings from the training data when confronted with entirely new, unseen time series or environments, which eliminates the need for additional labeled data or retraining. Instead, by calculating the matching probability between new samples and known clusters based on pre-trained prototype embeddings, it enables knowledge transfer through a cluster-aware feed-forward mechanism, thereby rapidly generating accurate predictions.
- Within the proposed DCAI framework, data augmentation is employed to generate novel data by leveraging observed prices and predicted prices. Distinct from traditional Open, High, Low, Close and Volume (OHLCV) data, this augmented data helps RL agents perceive more macroscale patterns in stock prices.
- Three DRL algorithms, namely Double Deep Q-Network (DDQN), Advantage Actor–Critic (A2C), and Proximal Policy Optimization (PPO), were utilized to evaluate the proposed framework. Experiments conducted on five widely adopted datasets, including the Dow Jones Industrial Average (DJI), NASDAQ 100, S&P 500 (SP500), Hang Seng Index (HSI), and Nikkei 225 (N225), demonstrate that the framework consistently outperforms various traditional methods and DRL-based approaches. These results highlight the framework’s potential in optimizing stock trading strategies and improving the reliability of algorithmic trading through the effective integration of DRL with predictions of future stock price movements.
2. Related Work
2.1. Time Seires Forecasting
2.2. Deep Reinforcement Learning in Trading
3. Preliminaries
Diverse Strategies for Multivariate Time Series Forecasting
4. Method
- The Efficient Market Hypothesis (EMH) [54] posits that financial markets comprise a large cohort of investors, whose core objective is to generate returns by harnessing available information. This competitive dynamics results in the mutual offsetting of individual actions, and when coupled with the constraints imposed by liquidity restrictions and short-selling borrowing costs, the impact of any single investor’s buy or sell orders on market prices becomes negligible. Consequently, individual transactions are deemed to exert no material influence on price trends.
- In the theory of financial market microstructure, the idealized benchmark state typically assumes that the market exhibits sufficient liquidity, is free from liquidity restrictions and short-selling borrowing costs, and involves no order execution slippage, with orders executable exactly at the prevailing market quotes at the time of submission.
- Under the framework of the Adaptive Market Hypothesis (AMH) [55], investors with learning capabilities and the ability to dynamically adjust strategies can achieve periodic excess returns during market panic. They leverage continuous cognitive iteration, optimized decision-making patterns and contrarian strategies like value anchored reverse operations to do so.
4.1. Overview of the Proposed Framework
4.2. Generate Exclusive Weights for Clusters
4.2.1. Cross-Attention for Prototype Embedding Generation
4.2.2. Feed-Forward Layer
4.2.3. Transfer Prediction in Financial Markets
| Algorithm 1 The Process of Future Value Prediction |
| Input: Historical Financial Time Series |
| Output: Future time series |
| Initialize the weights of linear layers and initialize m cluster embeddings for |
| Compute Clustering Probability Matrix: |
| Sample Clustering Membership Matrix: |
| Update Cluster Embedding via Cross Attention: |
| for channel i in do |
| Weight Averaging and Projection: where |
| end for |
| Algorithm 2 The Prediction Process of Unseen via Pre-trained Models |
| Input: Historical Financial Time Series ; pre-trained Model F |
| Output: Future time series |
| Load K cluster embedding and weights of K linear layers from F |
| Compute Clustering Probability Matrix: |
| Sample Clustering Membership Matrix: |
| for channel i in do |
| Weight Averaging and Projection: where |
| end for |
4.3. Data Center Artificial Intelligence
4.4. Reinforcement Learning Framework
4.4.1. State
4.4.2. Action
4.4.3. Reward
4.5. Proximal Policy Optimization
CE-PPO Training
5. Experiments
5.1. Datasets and Experimental Setup
5.2. Evaluation Metrics
5.3. Baseline Methods
6. Results and Analysis
6.1. Time Series Forecasting Results
6.2. Validation of Transfer Prediction in Financial Markets
6.3. Ablation Studies
6.4. Implementation of Reinforcement Learning Agents in Trading
6.5. Zero-Shot Prediction Applied to Trading
6.6. Analysis of Model Robustness
6.7. Analysis of Trading Stragegy
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Experimental
Appendix A.1. Datasets
| Financial Datasets | Start Date | End Date | Channel | Open Price | Frequency |
|---|---|---|---|---|---|
| N225 | 2007-1-1 | 2023-12-31 | 5 | 17,322.50 | 1 Day |
| SP500 | 2007-1-1 | 2023-12-31 | 5 | 1418.03 | 1 Day |
| DJI | 2007-1-1 | 2023-12-31 | 5 | 12,459.54 | 1 Day |
| NASDAQ | 2007-1-1 | 2023-12-31 | 5 | 1769.22 | 1 Day |
| HSI | 2007-1-1 | 2023-12-31 | 5 | 20,004.84 | 1 Day |
Appendix A.2. The Settings of Hyperparameters
| # Clusters | # Linear Layers in MLP | Hidden Dimension | # Layers (TSMixer) | # Layers (PatchTST) | # Layers (TimesNet) | ||
|---|---|---|---|---|---|---|---|
| N225 | 2 | 0.3 | 1 | 64 | 2 | 4 | 3 |
| SP500 | 2 | 0.3 | 1 | 64 | 2 | 4 | 3 |
| DJI | 2 | 0.3 | 1 | 64 | 2 | 4 | 3 |
| NASDAQ | 2 | 0.3 | 1 | 64 | 2 | 4 | 3 |
| HSI | 2 | 0.3 | 1 | 64 | 2 | 4 | 3 |
| Hyperparameters | CE-DDQN | CE-PPO | CE-A2C |
|---|---|---|---|
| input length | 20 | 20 | 20 |
| output length | 5 | 5 | 5 |
| Discount factor | 0.9 | 0.9 | 0.9 |
| Learning Rate | 0.001 | 0.001 | 0.001 |
| Batch Size | 32 | 32 | 32 |
| Episode | 100 | 100 | 100 |
| Replay Memory | 10,000 | 10,000 | 10,000 |
| Greedy | 0.9 | - | - |
| Clip | - | 0.1 | - |
Appendix B. Financial Time Series Forecasting Results




References
- Sun, S.; Qin, M.; Wang, X.; An, B. PRUDEX-compass: Towards systematic evaluation of reinforcement learning in financial markets. arXiv 2023, arXiv:2302.00586. [Google Scholar] [CrossRef]
- Achiam, J.; Held, D.; Tamar, A.; Abbeel, P. Constrained policy optimization. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; PMLR: New York, NY, USA, 2017; pp. 22–31. [Google Scholar]
- Ghosh, S.; Laguna, S.; Lim, S.H.; Wynter, L.; Poonawala, H. A deep ensemble method for multi-agent reinforcement learning: A case study on air traffic control. In Proceedings of the International Conference on Automated Planning and Scheduling, Guangzhou, China, 2–13 August 2021; Volume 31, pp. 468–476. [Google Scholar]
- Buckman, J.; Hafner, D.; Tucker, G.; Brevdo, E.; Lee, H. Sample-efficient reinforcement learning with stochastic ensemble value expansion. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Peng, Z.; Chen, C.; Luo, R.; Zhang, J.; Cheng, H.; Ghosh, B.K. Learning-Based Tracking Control of Unknown Robot Systems with Online Parameter Estimation. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 10–12 July 2024; pp. 3768–3774. [Google Scholar]
- Zhasulanov, D.; Marat, B.; Erkin, K.; Omirgaliyev, R.; Kushekkaliyev, A.; Zhakiyev, N. Enhancing gameplay experience through reinforcement learning in games. In Proceedings of the 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 15–17 May 2024; pp. 175–180. [Google Scholar]
- Zheng, Y.; Xing, Z.; Zhang, Q.; Jin, B.; Li, P.; Zheng, Y.; Xia, Z.; Zhan, K.; Lang, X.; Chen, Y.; et al. Planagent: A multi-modal large language agent for closed-loop vehicle motion planning. arXiv 2024, arXiv:2406.01587. [Google Scholar]
- Pang, H.; Wang, Z.; Li, G. Large language model guided deep reinforcement learning for decision making in autonomous driving. arXiv 2024, arXiv:2412.18511. [Google Scholar] [CrossRef]
- Ren, Y.; Sutherland, D.J. Learning dynamics of llm finetuning. arXiv 2024, arXiv:2407.10490. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, P.; Hong, J.; Li, J.; Zhang, Y.; Zheng, W.; Chen, P.Y.; Lee, J.D.; Yin, W.; Hong, M.; et al. Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark. arXiv 2024, arXiv:2402.11592. [Google Scholar]
- Zhou, C.; Huang, Y.; Kong, Y.; Lu, X. Enhancing trading strategies by combining incremental reinforcement learning and self-supervised prediction. Expert Syst. Appl. 2025, 289, 128297. [Google Scholar] [CrossRef]
- Yu, S.; Xue, H.; Ao, X.; Pan, F.; He, J.; Tu, D.; He, Q. Generating synergistic formulaic alpha collections via reinforcement learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 5476–5486. [Google Scholar]
- Gao, S.; Wang, Y.; Yang, X. StockFormer: Learning Hybrid Trading Machines with Predictive Coding. In Proceedings of the IJCAI, Macao, China, 19–25 August 2023; pp. 4766–4774. [Google Scholar]
- Zong, C.; Wang, C.; Qin, M.; Feng, L.; Wang, X.; An, B. Macrohft: Memory augmented context-aware reinforcement learning on high frequency trading. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 4712–4721. [Google Scholar]
- Zhao, L.; Kong, S.; Shen, Y. Doubleadapt: A meta-learning approach to incremental learning for stock trend forecasting. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 3492–3503. [Google Scholar]
- Fang, Y.; Tang, Z.; Ren, K.; Liu, W.; Zhao, L.; Bian, J.; Li, D.; Zhang, W.; Yu, Y.; Liu, T.Y. Learning multi-agent intention-aware communication for optimal multi-order execution in finance. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 4003–4012. [Google Scholar]
- Jin, W.; Wang, H.; Zha, D.; Tan, Q.; Ma, Y.; Li, S.; Lee, S.I. Dcai: Data-centric artificial intelligence. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 1482–1485. [Google Scholar]
- Jarrahi, M.H.; Memariani, A.; Guha, S. The principles of data-centric AI (DCAI). arXiv 2022, arXiv:2211.14611. [Google Scholar]
- Zha, D.; Bhat, Z.P.; Lai, K.H.; Yang, F.; Hu, X. Data-centric ai: Perspectives and challenges. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), Minneapolis, MN, USA, 27–29 April 2023; pp. 945–948. [Google Scholar]
- Kumar, S.; Sharma, R.; Singh, V.; Tiwari, S.; Singh, S.K.; Datta, S. Potential impact of data-centric AI on society. IEEE Technol. Soc. Mag. 2023, 42, 98–107. [Google Scholar] [CrossRef]
- Nieberl, M.; Zeiser, A.; Timinger, H. A review of data-centric artificial intelligence (dcai) and its impact on manufacturing industry: Challenges, limitations, and future directions. In Proceedings of the 2024 IEEE Conference on Artificial Intelligence (CAI), Singapore, 25–27 June 2024; pp. 44–51. [Google Scholar]
- Tang, C.; Qendro, L.; Spathis, D.; Kawsar, F.; Mascolo, C.; Mathur, A. Kaizen: Practical self-supervised continual learning with continual fine-tuning. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Los Alamitos, CA, USA, 3–8 January 2024; pp. 2829–2838. [Google Scholar]
- Van Gansbeke, W.; Vandenhende, S.; Georgoulis, S.; Gool, L.V. Revisiting contrastive methods for unsupervised learning of visual representations. Adv. Neural Inf. Process. Syst. 2021, 34, 16238–16250. [Google Scholar]
- Wu, Z.; Wang, Q.; Yang, J. SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation. IEEE Internet Things J. 2025, 12, 13021–13032. [Google Scholar] [CrossRef]
- Desplanques, B.; Thienpondt, J.; Demuynck, K. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv 2020, arXiv:2005.07143. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 19–21 May 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
- Chen, S.A.; Li, C.L.; Yoder, N.; Arik, S.O.; Pfister, T. Tsmixer: An all-mlp architecture for time series forecasting. arXiv 2023, arXiv:2303.06053. [Google Scholar] [CrossRef]
- Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv 2023, arXiv:2310.01728. [Google Scholar]
- Yuan, X.; Qiao, Y. Diffusion-ts: Interpretable diffusion for general time series generation. arXiv 2024, arXiv:2403.01742. [Google Scholar] [CrossRef]
- Jeon, J.; Park, J.; Park, C.; Kang, U. Frequant: A reinforcement-learning based adaptive portfolio optimization with multi-frequency decomposition. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 1211–1221. [Google Scholar]
- Niu, H.; Li, S.; Zheng, J.; Lin, Z.; Li, J.; Guo, J.; An, B. Imm: An imitative reinforcement learning approach with predictive representation learning for automatic market making. arXiv 2023, arXiv:2308.08918. [Google Scholar] [CrossRef]
- Niu, H.; Li, S.; Li, J. Macmic: Executing iceberg orders via hierarchical reinforcement learning. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Jeju, Republic of Korea, 3–9 August 2024; pp. 6008–6016. [Google Scholar]
- Li, Z.; Jiang, J.; Cao, Y.; Cui, A.; Wu, B.; Li, B.; Liu, Y. Logic-guided Deep Reinforcement Learning for Stock Trading. In Proceedings of the Tiny Papers@ ICLR, Vienna, Austria, 11 May 2024. [Google Scholar]
- Lien, Y.H.; Li, Y.K.; Wang, Y.S. Contrastive learning and reward smoothing for deep portfolio management. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; pp. 3966–3974. [Google Scholar]
- Luo, B.; Liu, D.; Huang, T.; Wang, D. Model-free optimal tracking control via critic-only Q-learning. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 2134–2144. [Google Scholar] [CrossRef]
- Dong, H.; Zhao, X.; Luo, B. Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 561–573. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Liu, W.; Gu, Y.; Ge, Y. Multi-factor stock trading strategy based on DQN with multi-BiGRU and multi-head ProbSparse self-attention. Appl. Intell. 2024, 54, 5417–5440. [Google Scholar] [CrossRef]
- Sun, S.; Xue, W.; Wang, R.; He, X.; Zhu, J.; Li, J.; An, B. DeepScalper: A risk-aware reinforcement learning framework to capture fleeting intraday trading opportunities. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 1858–1867. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Byun, W.J.; Choi, B.; Kim, S.; Jo, J. Practical application of deep reinforcement learning to optimal trade execution. FinTech 2023, 2, 414–429. [Google Scholar] [CrossRef]
- Zhang, T.; Ke, Z.; Chen, L.; Qiao, K.; Jia, Y.; Zhang, Z. Trading Performance of an Improved PPO Algorithm in the Chinese Stock Market. In Proceedings of the 2023 4th International Conference on Big Data Economy and Information Management, Zhengzhou, China, 8–10 December 2023; pp. 709–713. [Google Scholar]
- Barto, A.G.; Sutton, R.S.; Anderson, C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transac. Sys. Man Cybern. 2012, SMC-13, 834–846. [Google Scholar] [CrossRef]
- Sun, Q.; Si, Y.W. Supervised actor-critic reinforcement learning with action feedback for algorithmic trading. Appl. Intell. 2023, 53, 16875–16892. [Google Scholar] [CrossRef]
- Niu, H.; Li, S.; Li, J. MetaTrader: An reinforcement learning approach integrating diverse policies for portfolio optimization. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 1573–1583. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Sun, L.; Yao, T.; Yin, W.; Jin, R. Film: Frequency improved legendre memory model for long-term time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 12677–12690. [Google Scholar]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
- Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
- Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef]
- Fama, E.F. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970, 25, 383–417. [Google Scholar] [CrossRef]
- Lo, A.W. The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. J. Portf. Manag. Forthcom. 2004. [Google Scholar] [CrossRef]
- Kim, T.; Kim, J.; Tae, Y.; Park, C.; Choi, J.H.; Choo, J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- Chen, J.; Lenssen, J.E.; Feng, A.; Hu, W.; Fey, M.; Tassiulas, L.; Leskovec, J.; Ying, R. From similarity to superiority: Channel clustering for time series forecasting. Adv. Neural Inf. Process. Syst. 2024, 37, 130635–130663. [Google Scholar]
- Huang, Y.; Zhou, C.; Cui, K.; Lu, X. Improving algorithmic trading consistency via human alignment and imitation learning. Expert Syst. Appl. 2024, 253, 124350. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Suthaharan, S. Support vector machine. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Berlin/Heidelberg, Germany, 2016; pp. 207–235. [Google Scholar]
- Rigatti, S.J. Random forest. J. Insur. Med. 2017, 47, 31–39. [Google Scholar] [CrossRef] [PubMed]
- Théate, T.; Ernst, D. An application of deep reinforcement learning to algorithmic trading. Expert Syst. Appl. 2021, 173, 114632. [Google Scholar] [CrossRef]
- Cornalba, F.; Disselkamp, C.; Scassola, D.; Helf, C. Multi-objective reward generalization: Improving performance of Deep Reinforcement Learning for applications in single-asset trading. Neural Comput. Appl. 2024, 36, 619–637. [Google Scholar] [CrossRef]












| Old Position | Signal () | Actual Action () | New Position | Description |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | Hold the cash. |
| 0 | −1 | −1 | −1 | Open a short position. |
| 0 | 1 | 1 | 1 | Open a long position. |
| 1 | 1 | 0 | 1 | Hold the long position. |
| 1 | 0 | 0 | 1 | Hold the long position. |
| 0 | 0 | Hold the short position. | ||
| 0 | Hold the short position. | |||
| 1 | 0 | Close the long position. | ||
| 1 | 1 | 0 | Close the short position. |
| Datasets | Metrics | PatchTST | +CE | TimesNet | +CE | TSMixer | +CE | DLinear | +CE | IMP (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| N225 | MAE | 0.055 | 0.051 | 0.053 | 0.397 | 0.275 | 0.423 | 0.072 | 32.57% | |
| MSE | 0.005 | 0.207 | 0.133 | 0.208 | 0.008 | 54.78% | ||||
| RMSE | 0.069 | 0.067 | 0.067 | 0.446 | 0.351 | 0.441 | 0.091 | 30.88% | ||
| MAPE | 0.018 | 0.017 | 0.017 | 0.130 | 0.092 | 0.135 | 0.024 | 33.70% | ||
| SP500 | MAE | 0.058 | 0.057 | 0.057 | 0.512 | 0.312 | 0.112 | 0.088 | 19.64% | |
| MSE | 0.006 | 0.317 | 0.163 | 0.019 | 0.013 | 24.25% | ||||
| RMSE | 0.074 | 0.073 | 0.073 | 0.550 | 0.391 | 0.132 | 0.109 | 12.21% | ||
| MAPE | 0.017 | 0.013 | 0.013 | 0.111 | 0.071 | 0.025 | 0.020 | 25.97% | ||
| DJI | MAE | 0.048 | 0.077 | 0.043 | 0.514 | 0.310 | 0.087 | 0.074 | 26.82% | |
| MSE | 0.010 | 0.003 | 0.309 | 0.166 | 0.012 | 0.009 | 35.3% | |||
| RMSE | 0.062 | 0.096 | 0.055 | 0.547 | 0.396 | 0.105 | 0.092 | 20.9% | ||
| MAPE | 0.013 | 0.020 | 0.011 | 0.130 | 0.082 | 0.023 | 0.019 | 26.7% | ||
| NASDAQ | MAE | 0.079 | 0.082 | 0.081 | 1.083 | 0.327 | 0.155 | 0.132 | 23.21% | |
| MSE | 0.011 | 0.011 | 0.011 | 1.303 | 0.175 | 0.038 | 0.029 | 29.64% | ||
| RMSE | 0.105 | 0.013 | 1.113 | 0.404 | 0.185 | 0.162 | 19.54% | |||
| MAPE | 0.014 | 0.172 | 0.055 | 0.026 | 0.023 | 21.63% | ||||
| HSI | MAE | 0.067 | 0.067 | 0.067 | 0.511 | 0.344 | 0.124 | 0.114 | 10.55% | |
| MSE | 0.178 | 0.389 | 0.025 | 0.022 | 16.56% | |||||
| RMSE | 0.085 | 0.084 | 0.084 | 0.569 | 0.401 | 0.151 | 0.140 | 9.72% | ||
| MAPE | 0.145 | 0.135 | 0.129 | 0.793 | 0.565 | 0.348 | 0.341 | 9.56% |
| Datasets | Metrics | PatchTST | +CE | TimesNet | +CE | TSMixer | +CE | DLinear | +CE | IMP (%) |
|---|---|---|---|---|---|---|---|---|---|---|
| ① NASDAQ→DJI | MAE | 0.057 | 0.048 | 0.048 | 0.495 | 0.294 | 0.134 | 0.079 | 17.18% | |
| MSE | 0.005 | 0.004 | 0.004 | 0.299 | 0.147 | 0.027 | 0.010 | 39.70% | ||
| RMSE | 0.071 | 0.062 | 0.061 | 0.537 | 0.374 | 0.157 | 0.098 | 21.38% | ||
| MAPE | 0.015 | 0.013 | 0.175 | 0.125 | 0.037 | 0.021 | 17.73% | |||
| ② NASDAQ→SP500 | MAE | 0.069 | 0.058 | 0.056 | 0.632 | 0.286 | 0.165 | 0.094 | 28.88% | |
| MSE | 0.007 | 0.006 | 0.469 | 0.139 | 0.039 | 0.015 | 36.55% | |||
| RMSE | 0.085 | 0.074 | 0.670 | 0.361 | 0.192 | 0.116 | 25.01% | |||
| MAPE | 0.015 | 0.013 | 0.137 | 0.065 | 0.037 | 0.021 | 31.45% | |||
| ③ SP500→NASDAQ | MAE | 0.086 | 0.081 | 0.106 | 0.903 | 0.890 | 0.221 | 0.126 | 19.16% | |
| MSE | 0.012 | 0.011 | 0.020 | 0.948 | 0.894 | 0.071 | 0.026 | 31.85% | ||
| RMSE | 0.108 | 0.136 | 0.099 | 0.953 | 0.921 | 0.260 | 0.155 | 23.85% | ||
| MAPE | 0.014 | 0.017 | 0.144 | 0.141 | 0.036 | 0.021 | 18.61% | |||
| ④ SP500→DJI | MAE | 0.048 | 0.047 | 0.047 | 0.556 | 0.393 | 0.099 | 0.096 | 22.62% | |
| MSE | 0.004 | 0.004 | 0.369 | 0.194 | 0.015 | 0.015 | 31.96% | |||
| RMSE | 0.062 | 0.061 | 0.060 | 0.599 | 0.431 | 0.119 | 0.117 | 20.44% | ||
| MAPE | 0.012 | 0.012 | 0.012 | 0.141 | 0.099 | 0.026 | 0.025 | 22.99% | ||
| ⑤ DJI→SP500 | MAE | 0.058 | 0.057 | 0.094 | 0.651 | 0.304 | 0.117 | 0.113 | 31.92% | |
| MSE | 0.014 | 0.484 | 0.150 | 0.024 | 0.021 | 45.08% | ||||
| RMSE | 0.062 | 0.061 | 0.115 | 0.682 | 0.387 | 0.142 | 0.137 | 27.30% | ||
| MAPE | 0.012 | 0.012 | 0.021 | 0.141 | 0.069 | 0.026 | 0.025 | 29.99% | ||
| ⑥ DJI→NASDAQ | MAE | 0.083 | 0.081 | 0.083 | 1.170 | 1.106 | 0.146 | 0.124 | 6.68% | |
| MSE | 0.011 | 0.011 | 0.012 | 1.522 | 1.335 | 0.034 | 0.026 | 13.13% | ||
| RMSE | 0.105 | 0.103 | 0.105 | 1.210 | 1.129 | 0.176 | 0.154 | 6.46% | ||
| MAPE | 0.014 | 0.187 | 0.176 | 0.024 | 0.021 | 6.38% |
| Datasets | Metrics | B&H | S&H | MR | SVM | Random Forest | TDQN | DQN-HER | CE DDQN | CE A2C | CE PPO |
|---|---|---|---|---|---|---|---|---|---|---|---|
| N225 | CR | 3.76% | −0.87% | 89.44% | 47.71% | 73.62% | −13.90% | 41.62% | 353.853% | 306.201% | 232.626% |
| AR | 10.55% | −10.38% | 4.84% | 4.33% | 7.72% | −4.23% | 11.42% | 64.43% | 83.88% | 63.16% | |
| SR | 0.54 | −0.40 | 1.79 | 0.02 | 0.47 | −0.21 | 0.72 | 4.201 | 3.885 | 3.185 | |
| MDD | 20.23% | 2.58% | 27.32% | 23.78% | 34.77% | 31.67% | 15.96% | 25.46% | 30.54% | 20.98% | |
| SP500 | CR | 2.354% | −0.461% | 52.77% | 72.71% | 95.96% | −36.76% | 90.21% | 137.224% | 353.595% | 654.27% |
| AR | 0.32% | 0.03% | 2.54% | 3.28% | 17.57% | −21.59% | 20.37% | 52.18% | 84.08% | 78.61% | |
| SR | 0.31 | 0.60 | 0.05 | 0.88 | 0.38 | −1.09 | 1.15 | 2.937 | 3.266 | 4.83 | |
| MDD | 56.03% | 19.06% | 44.65% | 33.78% | 34.77% | 45.31% | 13.96% | 22.53% | 33.87% | 22.81% | |
| DJI | CR | 4.12% | −0.58% | 34.12% | 36.22% | 16.73% | −45.79% | 74.83% | 445.528% | 152.272% | 177.683% |
| AR | 10.55% | −0.03% | 4.84% | 2.84% | 0.94% | −24.34% | 18.29% | 92.74% | 55.54% | 60.74% | |
| SR | 0.54 | −8.24 | 0.27 | 1.09 | 0.28 | −0.91 | 1.47 | 4.489 | 2.020 | 2.088 | |
| MDD | 20.23% | 30.66% | 17.95% | 29.73% | 13.14% | 44.31% | 10.96% | 27.53% | 30.17% | 27.79% | |
| NASDAQ | CR | 8.98% | 1.36% | 21.53% | 155.60% | 72.35% | 14.29% | 35.02% | 1334.781% | 1183.893% | 608.815% |
| AR | 10.55% | 0.08% | 3.81% | 5.56% | 4.69% | 1.87% | 10.94% | 137.94% | 133.11% | 106.90% | |
| SR | 0.58 | −0.62 | 0.27 | 0.26 | 0.20 | 1.02 | 0.69 | 7.125 | 6.968 | 4.967 | |
| MDD | 58.23% | 50.27% | 17.95% | 36.55% | 23.53% | 31.57% | 12.73% | 6.02% | 3.05% | 8.03% | |
| HSI | CR | −10.75% | −1.29% | 33.13% | 37.23% | 37.51% | 35.67% | 44.27% | 305.523% | 302.819% | 104.964% |
| AR | 1.55% | −0.08% | 2.31% | 3.71% | 2.79% | 11.43% | 12.41% | 89.95% | 80.31% | 45.08% | |
| SR | −0.10 | −1.69 | 0.23 | 0.37 | 0.75 | 0.73 | 0.68 | 4.201 | 3.974 | 2.016 | |
| MDD | 55.86% | 31.35% | 55.13% | 29.32% | 39.72% | 31.28% | 17.64% | 25.46% | 35.08% | 24.02% |
| Datasets | Metrics | A2C | DDQN | PPO | Datasets | Metrics | A2C | DDQN | PPO |
|---|---|---|---|---|---|---|---|---|---|
| ① NASDAQ→DJI | CR | 173.047% | 513.933% | 178.183% | ② NASDAQ→SP500 | CR | 244.48% | 239.642% | 178.19% |
| AR | 59.58% | 98.06% | 60.83% | AR | 70.98% | 70.32% | 60.29% | ||
| SR | 2.17 | 3.74 | 2.09 | SR | 2.68 | 2.24 | 2.27 | ||
| MDD | 26.99% | 21.72% | 28.69% | MDD | 31.54% | 36.54% | 34.83% | ||
| ③ SP500→NASDAQ | CR | 1032.89% | 1284.37% | 581.43% | ④ SP500→DJI | CR | 184.69% | 490.06% | 204.409% |
| AR | 129.61% | 117.38% | 91.83% | AR | 61.59% | 96.25% | 65.24% | ||
| SR | 6.12 | 6.61 | 4.53 | SR | 2.29 | 3.67 | 2.29 | ||
| MDD | 9.81% | 7.36% | 8.53% | MDD | 23.88% | 24.03% | 24.67% | ||
| ⑤ DJI→SP500 | CR | 264.11% | 258.53% | 190.94% | ⑥ DJI→NASDAQ | CR | 1046.37% | 1127.51% | 579.34% |
| AR | 73.65% | 72.90% | 62.47% | AR | 124.39% | 118.91% | 94.65% | ||
| SR | 2.80 | 2.78 | 2.40 | SR | 6.14 | 6.75 | 4.31 | ||
| MDD | 30.65% | 30.79% | 31.41% | MDD | 4.71% | 6.89% | 8.74% |
| Method | Datasets | CR | AR | SR | MDD | Method | Datasets | CR | AR | SR | MDD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DDQN with political events | N225 | 353.853% | 64.43% | 4.201 | 25.46% | DDQN without political events | N225 | 124.67% | 31.54% | 2.130 | 28.64% |
| SP500 | 137.224% | 52.18% | 2.937 | 22.53% | SP500 | 81.973% | 31.76% | 1.237 | 27.31% | ||
| DJI | 445.528% | 92.74% | 4.489 | 27.53% | DJI | 173.371% | 51.34% | 2.132 | 31.26% | ||
| NASDAQ | 1334.78% | 137.94% | 7.12 | 6.02% | NASDAQ | 486.314% | 42.93% | 3.36 | 14.26% | ||
| HSI | 305.52% | 89.95% | 4.20 | 25.46% | HSI | 119.31% | 41.62% | 2.91 | 29.31% | ||
| A2C with political events | N225 | 306.201% | 83.88% | 3.88 | 30.54% | A2C without political events | N225 | 196.351% | 43.13% | 2.74 | 35.08% |
| SP500 | 353.595% | 84.08% | 3.266 | 33.87% | SP500 | 129.372% | 36.98% | 2.43 | 35.92% | ||
| DJI | 445.528% | 92.74% | 4.489 | 27.53% | DJI | 374.346% | 41.34% | 2.93 | 30.71% | ||
| NASDAQ | 1334.78% | 137.94% | 7.12 | 6.02% | NASDAQ | 561.14% | 72.67% | 3.23 | 12.34% | ||
| HSI | 305.523% | 89.95% | 4.20 | 25.46% | HSI | 234.31% | 57.32% | 2.91 | 29.61% | ||
| PPO with political events | N225 | 232.626% | 63.16% | 3.185 | 20.98% | PPO without political events | N225 | 297.317% | 37.81% | 3.241 | 27.31% |
| SP500 | 654.27% | 78.61% | 4.83 | 22.81% | SP500 | 93.618% | 37.81% | 1.97 | 26.71% | ||
| DJI | 177.683% | 60.74% | 2.088 | 27.79% | DJI | 51.231% | 27.84% | 1.127 | 30.21% | ||
| NASDAQ | 608.815% | 106.90% | 4.967 | 8.03% | NASDAQ | 431.673% | 71.31% | 3.813 | 14.34% | ||
| HSI | 104.964% | 45.08% | 2.016 | 24.02% | HSI | 73.127% | 29.87% | 1.476 | 28.64% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, H.; Li, X.; Wan, T.; Du, J. Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction. Symmetry 2026, 18, 112. https://doi.org/10.3390/sym18010112
Zhang H, Li X, Wan T, Du J. Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction. Symmetry. 2026; 18(1):112. https://doi.org/10.3390/sym18010112
Chicago/Turabian StyleZhang, Haoran, Xiaofei Li, Tianjiao Wan, and Junjie Du. 2026. "Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction" Symmetry 18, no. 1: 112. https://doi.org/10.3390/sym18010112
APA StyleZhang, H., Li, X., Wan, T., & Du, J. (2026). Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction. Symmetry, 18(1), 112. https://doi.org/10.3390/sym18010112

