A Novel AI-Based Trading Framework for Futures Markets: Evidence from the MTX Case Study
Abstract
1. Introduction
- Adaptive Trading Framework: This study proposes an adaptive trading architecture that integrates two deep reinforcement learning (DRL) agents specialized for bullish and bearish market conditions. Market regimes are implicitly characterized through predicted return–risk dynamics and volatility patterns, while an xLSTM-based trader selector dynamically allocates trading actions across regimes, enabling effective adaptation to non-stationary financial markets.
- Practical and Deployable System: An automated trading system is developed and deployed in the Taiwan Futures Exchange (MTX) environment, demonstrating the practical feasibility and robustness of the proposed framework under real-world trading constraints, including market volatility and transaction dynamics.
- Return-Oriented Performance Evaluation: Extensive backtesting is conducted across multiple market scenarios and exchange-traded funds (ETFs) using both return-based and risk-adjusted performance metrics, such as cumulative return, Sharpe ratio, and maximum drawdown. The experimental results show that the proposed framework achieves substantially higher cumulative returns compared with the MTX index and benchmark ETFs, albeit at the cost of increased return volatility, resulting in a moderate Sharpe ratio. This highlights a deliberate trade-off favoring return maximization in aggressive and speculative trading settings.
2. Background and Related Work
2.1. Background
2.1.1. Reinforcement Learning (RL)
2.1.2. Proximal Policy Optimization
2.1.3. Extended Long Short-Term Memory
2.2. Related Work
2.2.1. Financial Applications of Reinforcement Learning
2.2.2. Financial Applications of Deep Learning
2.2.3. Distinctiveness of the Proposed Approach
3. Methodology
3.1. Framework Overview
3.2. RL Agent
3.2.1. Future Market Environment
- State Space (SS).
- Balance: The amount of capital available for investment at time t.
- Lots: The number of contracts held at time t.
- Close: The futures closing price.
- MACD: Measures momentum and trend direction, useful for detecting trend reversals.
- RSI: Identifies potential reversal points and overbought/oversold conditions.
- CCI: Captures deviations of price from its average, providing buy/sell signals.
- ADX: Evaluates the strength of a trend.
- 2.
- Action Space (AS).
- 3.
- Action Masking.
- 4.
- Reward Function (RF).
- 5.
- Terminal Condition.
- 6.
- Margin and Transaction Fees.
- Initial Margin (IM): Minimum funds required to open a new position.
- Maintenance Margin (MM): Minimum balance to sustain an open position; if the account falls below this threshold, positions are liquidated.
- Settlement Margin (SM): A fee collected by the Taiwan Futures Exchange from brokers to mitigate default risks.
3.2.2. Datasets
- Raw Data.
- 2.
- Synthetic Data.
3.2.3. Data Preprocessing
3.3. Trader Selector
Data Splitting and Training Procedure
4. Experiments Results
4.1. Experiment Setup and Methods
4.1.1. Daily Trading Flow
| Algorithm 1: Trader Selector and Daily Trading Framework | |
| 1. | Begin daily trading session (night session starts) |
| 2. | Obtain historical daily data and perform denoising |
| 3. | Select a trader agent for the day using the Trader Selector |
| 4. | 4. If the selected agent is the same as the prior day: a. Retrieve historical 15 min K-line data b. Selected agent decides trading action based on the data Else: a. Close currently held position from the previous agent b. Selected agent decides trading action |
| 5. | Execute trading action |
| 6. | Update account and market status |
| 7. | Check if today’s early trading session has ended a. If yes, loop back to step 1 for the next trading day b. If no, continue monitoring until session ends |
4.1.2. Experiment Settings
4.1.3. Evaluation Metrics
- Precision: Measures the proportion of correctly predicted positive instances among all instances predicted as positive. Precision = TP/(TP + FP)
- Accuracy: Represents the ratio of correctly predicted instances to the total number of predictions, reflecting overall model correctness. Accuracy = (TP + TN)/(TP + TN + FP + FN)
- Recall: Measures the ratio of true positive predictions to the total number of actual positive instances, indicating the model’s ability to capture positive cases. Precision = TP/(TP + FN)
- F1 Score: The harmonic mean of precision and recall, used to assess the balance between the two in classification performance. F1 = (2 × Precision × Recall)/(Precision + Recall)
- Return on Investment (ROI): ROI = (final asset − total cost)/total cost, where the total cost represents the cumulative cost over the entire evaluation period.
- Internal Rate of Return (IRR): Measures the annualized rate of growth of the investment, taking into account the timing and magnitude of cash flows. IRR is defined as the discount rate that sets the net present value (NPV) of all cash flows to zero:where is the net cash flow at time t, calculated as the monthly invested capital plus or minus realized profits/losses. In this study, a fixed capital of TWD 500,000 is invested each month. If the previous month did not incur a loss, no additional capital injection is needed. This rule ensures that IRR reflects both the effect of periodic investments and the loss-compensation mechanism, providing an annualized measure of the strategy’s growth under realistic trading conditions.
- Sharpe Ratio: Evaluates risk-adjusted performance by comparing excess returns to the standard deviation of returns, capturing the trade-off between return and volatility.
4.1.4. Baselines and Comparison Methods
- Buy & Hold (long): purchasing the asset at the beginning of the period and holding it without active trading.
- Short & Hold: taking a short position at the beginning of the period and maintaining it throughout.
- Accumulating ETF: We use 00663L.TW and 00675L.TW as our benchmarks. These two ETFs track the TAIEX and implement their strategies through futures trading.
4.2. Results and Discussion
4.2.1. Results of Individual Agents
4.2.2. Framework Performance Evaluation via Simulation
4.3. Practical Performance of the Proposed Framework
4.3.1. Trader Selector Evaluation
4.3.2. Framework Evaluation
4.3.3. Trading Behavior Visualization
4.4. Results Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Al-Khasawneh, M. A., Raza, A., Khan, S. U. R., & Khan, Z. (2025). Stock market trend prediction using deep learning approach. Computational Economics, 66(1), 453–484. [Google Scholar] [CrossRef]
- Beck, M., Pöppel, K., Spanring, M., Auer, A., Prudnikova, O., Kopp, M., Klambauer, G., Brandstetter, J., & Hochreiter, S. (2024). xlstm: Extended long short-term memory. Advances in Neural Information Processing Systems, 37, 107547–107603. [Google Scholar]
- Biswas, A. K., Bhuiyan, M. S. A., Mir, M. N. H., Rahman, A., Mridha, M. F., Islam, M. R., & Watanobe, Y. (2025). A dual output temporal convolutional network with attention architecture for stock price prediction and risk assessment. IEEE Access, 13, 53621–53639. [Google Scholar] [CrossRef]
- Challu, C., Olivares, K. G., Oreshkin, B. N., Ramirez, F. G., Canseco, M. M., & Dubrawski, A. (2023). Nhits: Neural hierarchical interpolation for time series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 6989–6997. [Google Scholar] [CrossRef]
- Chen, D.-Y., Wu, C.-W., & Wang, C.-W. (2025). Using CNN models to predict the future trends of listed stocks on the Taiwan stock exchange. In International conference on human-computer interaction. Springer Nature. [Google Scholar]
- Das, A., Kong, W., Leach, A., Mathur, S., Sen, R., & Yu, R. (2023). Long-term forecasting with tide: Time-series dense encoder. arXiv, arXiv:2304.08424. [Google Scholar]
- Ge, Q. (2025). Enhancing stock market Forecasting: A hybrid model for accurate prediction of S&P 500 and CSI 300 future prices. Expert Systems with Applications, 260, 125380. [Google Scholar]
- Gil, G. L., Duhamel-Sebline, P., & McCarren, A. (2024). An evaluation of deep learning models for stock market trend prediction. arXiv, arXiv:2408.12408. [Google Scholar] [CrossRef]
- Heess, N., Hunt, J. J., Lillicrap, T. P., & Silver, D. (2015). Memory-based control with recurrent neural networks. arXiv, arXiv:1512.04455. [Google Scholar] [CrossRef]
- Hochreiter, S., & Schmidhuber, J. (1996, December 2–5). LSTM can solve hard long time lag problems. Advances in Neural Information Processing Systems, 9, Denver, CO, USA. [Google Scholar]
- Huang, S., & Ontañón, S. (2020). A closer look at invalid action masking in policy gradient algorithms. arXiv, arXiv:2006.14171. [Google Scholar] [CrossRef]
- Kabbani, T., & Duman, E. (2022). Deep reinforcement learning approach for trading automation in the stock market. IEEE Access, 10, 93564–93574. [Google Scholar] [CrossRef]
- Kirtac, K., & Germano, G. (2024). Sentiment trading with large language models. Finance Research Letters, 62, 105227. [Google Scholar] [CrossRef]
- Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017, July 21–26). Temporal convolutional networks for action segmentation and detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. [Google Scholar]
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv, arXiv:1509.02971. [Google Scholar]
- Lim, B., Arık, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4), 1748–1764. [Google Scholar] [CrossRef]
- Liu, Y., Liu, Q., Zhao, H., Pan, Z., & Liu, C. (2020, February 7–12). Adaptive quantitative trading: An imitative deep reinforcement learning approach. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA. [Google Scholar]
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. [Google Scholar] [CrossRef] [PubMed]
- Oreshkin, B. N., Carpov, D., Chapados, N., & Bengio, Y. (2019). N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv, arXiv:1905.10437. [Google Scholar]
- Padmanayana, V., & Bhavya, K. (2021). Stock market prediction using Twitter sentiment analysis. International Journal of Scientific Research in Science and Technology, 7(4), 265–270. [Google Scholar] [CrossRef]
- Peng, Y.-L. (2025, June 25–27). Probability-driven hedge ratio optimization using machine learning: An adaptive approach to financial risk management. 2025 IEEE/ACIS 29th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Busan, Republic of Korea. [Google Scholar]
- Qin, M., Cai, X., Li, Y., Xia, H., Zong, C., Sun, S., Wang, X., & An, B. (2025). FineFT: Efficient and risk-aware ensemble reinforcement learning for futures trading. arXiv, arXiv:2512.23773. [Google Scholar]
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015a, July 6–11). Trust region policy optimization. International Conference on Machine Learning, Lille, France. [Google Scholar]
- Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015b). High-dimensional continuous control using generalized advantage estimation. arXiv, arXiv:1506.02438. [Google Scholar]
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. [Google Scholar] [CrossRef]
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). MIT Press. [Google Scholar]
- Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., & KG, A. (2024). Gymnasium: A standard interface for reinforcement learning environments. arXiv, arXiv:2407.17032. [Google Scholar] [CrossRef]
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017, December 4–9). Attention is all you need. Advances in Neural Information Processing Systems 30, Long Beach, CA, USA. [Google Scholar]
- Wang, S. (2023). A stock price prediction method based on BiLSTM and improved transformer. IEEE Access, 11, 104211–104223. [Google Scholar] [CrossRef]
- Xu, M., Lan, Z., Tao, Z., Du, J., & Ye, Z. (2024, May 24–26). Deep reinforcement learning for quantitative trading. 2024 4th International Conference on Electronics, Circuits and Information Engineering (ECIE), Hangzhou, China. [Google Scholar]
- Xu, Y., & Cohen, S. B. (2018, July 15–20). Stock movement prediction from tweets and historical prices. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia. [Google Scholar]
- Yang, H., Liu, X.-Y., Zhong, S., & Walid, A. (2020, October 15–16). Deep reinforcement learning for automated stock trading: An ensemble strategy. Proceedings of the First ACM International Conference on AI in Finance, Manhattan, NY, USA. [Google Scholar]
- Yu, X., Wu, W., Liao, X., & Han, Y. (2023). Dynamic stock-decision ensemble strategy based on deep reinforcement learning. Applied Intelligence, 53(2), 2452–2470. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J., Ye, L., & Lai, Y. (2023). Stock price prediction using CNN-BiLSTM-Attention model. Mathematics, 11(9), 1985. [Google Scholar] [CrossRef]
- Zhang, Z., Zohren, S., & Roberts, S. (2019). Deep reinforcement learning for trading. arXiv, arXiv:1911.10107. [Google Scholar] [CrossRef]
- Zhao, Y., & Yang, G. (2023). Deep learning-based integrated framework for stock price movement prediction. Applied Soft Computing, 133, 109921. [Google Scholar] [CrossRef]









| Selector | Training | Application |
|---|---|---|
| 2021 H1 | January 2010~December 2021 | January 2021~June 2021 |
| 2021 H2 | July 2010~June 2021 | July 2021~December 2021 |
| 2022 H1 | January 2011~June 2021 | January 2022~June 2022 |
| 2022 H2 | July 2011~December 2021 | July 2022~December 2022 |
| 2023 H1 | January 2012~June 2022 | January 2023~June 2023 |
| 2023 H2 | July 2012~December 2022 | July 2023~December 2023 |
| 2024 H1 | January 2013~June 2023 | January 2024~June 2024 |
| 2024 H2 | July 2013~June 2024 | July 2024~December 2024 |
| 2025 H1 | January 2014~December 2024 | January 2025~June 2025 |
| Trader | ROI (%) | IRR (%) | Sharpe Ratio |
|---|---|---|---|
| Long | 458.00 | 53.00 | 0.29 |
| Short | −89.00 | −42.00 | −0.32 |
| Buy and Hold | 845.00 | 46.20 | 0.52 |
| Sell and Hold | −75.00 | −29.00 | −0.28 |
| Dual Thrust | 100.00 | 18.00 | 0.23 |
| 00675L.TW | 569.60 | 39.60 | N/A |
| 00663L.TW | 581.70 | 39.80 | N/A |
| Accuracy (%) | ROI (%) | IRR (%) | Sharpe Ratio |
|---|---|---|---|
| 50 | 41.46 | −1.50 | −0.04 |
| 60 | 1111.72 | 81.10 | 0.28 |
| 70 | 3477.17 | 141.80 | 0.35 |
| 80 | 6762.11 | 186.00 | 0.36 |
| 90 | 10,726.46 | 222.50 | 0.27 |
| 100 | 16,010.76 | 256.00 | 0.31 |
| Selector | Accuracy (%) | F1 Score | Precision | Recall | Profit Without Slippage Price Setting (K, NTD$) | Profit with Slippage Price Setting (K, NTD$) |
|---|---|---|---|---|---|---|
| 2021 H1 | 73.75 | 0.784 | 0.731 | 0.844 | 2510 | 2410 |
| 2021 H2 | 65.66 | 0.738 | 0.584 | 1.000 | 1254 | 1154 |
| 2022 H1 | 65.26 | 0.613 | 0.582 | 0.646 | 1215 | 1225 |
| 2022 H2 | 66.66 | 0.515 | 0.531 | 0.500 | 1410 | 1513 |
| 2023 H1 | 68.75 | 0.704 | 0.673 | 0.737 | 1619 | 961 |
| 2023 H2 | 66.66 | 0.515 | 0.531 | 0.500 | 1197 | 809 |
| 2024 H1 | 68.75 | 0.704 | 0.673 | 0.737 | 1490 | 1093 |
| 2024 H2 | 57.01 | 0.673 | 0.507 | 1.000 | 510 | 648 |
| 2025 H1 | 55.56 | 0.560 | 0.583 | 0.538 | 395 | 479 |
| Average | 65.73 | 0.645 | 0.600 | 0.722 | 1300 | 1144 |
| Method | ROI (%) | IRR (%) | Sharpe Ratio |
|---|---|---|---|
| Buy and Hold | 845.00 | 46.20 | 0.52 |
| Sell and Hold | −75.00 | −29.00 | −0.28 |
| Dual Thrust | 100.00 | 18.00 | 0.23 |
| 00663L.TW | 569.60 | 39.60 | N/A |
| 00675L.TW | 581.70 | 39.80 | N/A |
| Proposed Framework without slippage setting | 2240.00 | 109.00 | 0.31 |
| Proposed Framework with slippage setting | 2058.00 | 71.00 | 0.35 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hsieh, Y.-H.; Lai, C.-H.; Yuan, S.-M. A Novel AI-Based Trading Framework for Futures Markets: Evidence from the MTX Case Study. Int. J. Financial Stud. 2026, 14, 67. https://doi.org/10.3390/ijfs14030067
Hsieh Y-H, Lai C-H, Yuan S-M. A Novel AI-Based Trading Framework for Futures Markets: Evidence from the MTX Case Study. International Journal of Financial Studies. 2026; 14(3):67. https://doi.org/10.3390/ijfs14030067
Chicago/Turabian StyleHsieh, Yu-Heng, Chiung-Han Lai, and Shyan-Ming Yuan. 2026. "A Novel AI-Based Trading Framework for Futures Markets: Evidence from the MTX Case Study" International Journal of Financial Studies 14, no. 3: 67. https://doi.org/10.3390/ijfs14030067
APA StyleHsieh, Y.-H., Lai, C.-H., & Yuan, S.-M. (2026). A Novel AI-Based Trading Framework for Futures Markets: Evidence from the MTX Case Study. International Journal of Financial Studies, 14(3), 67. https://doi.org/10.3390/ijfs14030067

