A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting
Abstract
1. Introduction
- A leakage-aware walk-forward evaluation of classical and deep learning models for next-day log-return forecasting across six U.S.-listed equities (2014–2024).
- A feature construction pipeline that respects information timing (lagged macro variables; technical indicators computed with rolling windows ending at day t).
- A comparison that reports forecasting metrics and a simple rule-based SMA(10/50) benchmark net of transaction costs as a contextual reference.
- A transparent tuning protocol with a fixed search budget across models, where all hyperparameters (including ARIMA order and deep-model settings) are selected by minimum validation RMSE within each walk-forward fold; the search spaces are reported in Table 1 and the tuning budget is audited in Table A1 and Table A2 in Appendix A.
2. Data Collection and Pre-Processing
2.1. Data Preparation
2.2. Experimental Setup
2.3. Macroeconomic Indicators
- Consumer Price Index (CPI; CPIAUCSL) as an inflation proxy;
- Short-term policy interest rate (Federal Funds Rate; FEDFUNDS);
- Gross Domestic Product (real GDP; GDPC1) as an economic activity proxy.
2.4. Feature Engineering
- SMA(50): 50-day simple moving average of the adjusted close, computed from .
- RSI(14): 14-day Relative Strength Index computed from trailing gains and losses over .
- MACD(12,26,9): MACD line defined as , with a 9-day EMA signal line (all computed using information up to day t).
3. Transformer-Based Model for Forecasting
3.1. Input Embedding and Positional Encoding
3.2. Transformer Encoder and Prediction Head
3.3. Training Objective
3.4. Data Preparation
3.5. Model Architecture
3.6. Training and Optimization
4. Comparison Models
4.1. ARIMA (Autoregressive Integrated Moving Average)
- nonlinearities and volatility clustering in financial returns;
- longer-range dependencies beyond its chosen lag order;
- structural changes (time-varying dynamics) across market regimes.
4.2. LSTM (Long Short-Term Memory)
4.3. RNN (Recurrent Neural Network)
4.4. CNN (Convolutional Neural Network)
4.5. Random Forest
5. Model Evaluation and Performance Analysis
5.1. Error Analysis
5.2. Model Correlation Heatmap
6. Backtesting Strategy and Trading Performance
6.1. Trading Strategy Design
- Long: if , hold a 100% long position.
- Cash: if , hold cash (0% equity).
6.2. Backtest Setup
6.3. Performance Metrics
6.4. Stock-Level Trading Performance
7. Limitations
8. Conclusions and Future Work
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Hyperparameter Tuning Evidence
| Model | Configs Per Ticker Per Fold | Configs Per Fold (6 Tickers) | Selection Rule |
|---|---|---|---|
| ARIMA | 72 | 432 | Min validation RMSE |
| Random Forest | 36 | 216 | Min validation RMSE |
| RNN | 162 | 972 | Min validation RMSE (early stop on val loss) |
| LSTM | 162 | 972 | Min validation RMSE (early stop on val loss) |
| CNN | 162 | 972 | Min validation RMSE (early stop on val loss) |
| Transformer | 729 | 4374 | Min validation RMSE (early stop on val loss) |
| Total | 1323 | 7938 | – |
References
- Alamu, O. S., & Siam, M. K. (2024). Stock price prediction and traditional models: An approach to achieve short-, medium-and long-term goals. arXiv, arXiv:2410.07220. [Google Scholar] [CrossRef]
- Atsalakis, G. S., & Valavanis, K. P. (2009). Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Systems with Applications, 36(7), 10696–10707. [Google Scholar] [CrossRef]
- Bhogade, V., & Nithya, B. (2024). Time series forecasting using transformer neural network. International Journal of Computers and Applications, 46(10), 880–888. [Google Scholar] [CrossRef]
- Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383–417. [Google Scholar] [CrossRef]
- Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654–669. [Google Scholar] [CrossRef]
- Gezici, A. H. B., & Sefer, E. (2024). Deep transformer-based asset price and direction prediction. IEEE Access, 12, 24164–24178. [Google Scholar] [CrossRef]
- Kehinde, T. O., Khan, W. A., & Chung, S. H. (2023, October 10–12). Financial market forecasting using RNN, LSTM, BiLSTM, GRU and transformer-based deep learning algorithms. Proceedings of the IEOM Inrior work has studied both price and return forecasternational Conference on Smart Mobility and Vehicle Electrification, Detroit, MI, USA. [Google Scholar]
- Li, H., & Liu, T. (2023). Portfolio optimization based on the LSTM forecasting model. Proceedings of the 2nd International Conference on Financial Technology and Business Analysis, 48(1), 97–106. [Google Scholar] [CrossRef]
- Li, W., & Law, K. E. (2024). Deep learning models for time series forecasting: A review. IEEE Access, 12, 92306–92327. [Google Scholar] [CrossRef]
- Sunny, M. A. I., Maswood, M. M. S., & Alharbi, A. G. (2020). Deep learning-based stock price prediction using LSTM and bi-directional LSTM model. In Proceedings of the 2020 2nd novel intelligent and leading emerging sciences conference (NILES) (pp. 87–92). IEEE. [Google Scholar]
- Tyralis, H., & Papacharalampous, G. (2017). Variable selection in time series forecasting using random forests. Algorithms, 10(4), 114. [Google Scholar] [CrossRef]
- Widiputra, H., Mailangkay, A., & Gautama, E. (2021). Multivariate CNN-LSTM model for multiple parallel financial time-series prediction. Complexity, 2021(1), 9903518. [Google Scholar] [CrossRef]
- Xu, C., Li, J., Feng, B., & Lu, B. (2023). A financial time-series prediction model based on multiplex attention and linear transformer structure. Applied Sciences, 13(8), 5175. [Google Scholar] [CrossRef]
- Zeng, Z., Kaur, R., Siddagangappa, S., Rahimi, S., Balch, T., & Veloso, M. (2023). Financial time series forecasting using CNN and transformer. arXiv, arXiv:2304.04912. [Google Scholar] [CrossRef]
- Zulqarnain, M., Ghazali, R., Ghouse, M. G., Hassim, Y. M. M., & Javid, I. (2020). Predicting financial prices of stock market using recurrent convolutional neural networks. International Journal of Intelligent Systems and Applications, 9(6), 21. [Google Scholar] [CrossRef]






| Model | Search Space/Selection Rule |
|---|---|
| ARIMA | grid: , , ; select by minimum validation RMSE (test never used). |
| Random Forest | Grid: ; select by minimum validation RMSE. |
| RNN/LSTM | Grid: ; ;
;
; ; early stopping on validation loss (patience = 3), select by minimum validation RMSE. |
| CNN | Grid: ; ;
;
; ; early stopping on validation loss (patience = 3), select by minimum validation RMSE. |
| Transformer | Grid: ; ; ; ; ; ; early stopping on validation loss (patience = 3), select by minimum validation RMSE. |
| Ticker | Company |
|---|---|
| NVDA | NVIDIA Corporation, California, CA, USA |
| TSLA | Tesla, Inc., California, CA, USA |
| SMCI | Super Micro Computer, Inc., California, CA, USA |
| GOOGL | Alphabet Inc., California, CA, USA |
| PYPL | PayPal Holdings, Inc., California, CA, USA |
| SNAP | Snap Inc., California, CA, USA |
| Parameter | Value |
|---|---|
| Model Dimension () | 64 |
| Attention Heads (h) | 4 |
| Key Dimension per Head () | 16 |
| Encoder Layers (N) | 2 |
| Lookback Window (L) | 60 |
| Optimizer | Adam |
| Learning Rate | 0.001 |
| Loss Function | Huber () |
| Dropout Rate | 0 (selected; i.e., no dropout) |
| Max Epochs | 10 |
| Early stopping (patience) | 3 |
| Batch Size | 16 |
| Stock | Model | Huber | MAE | MSE | RMSE |
|---|---|---|---|---|---|
| GOOGL | ARIMA | 0.000131 | 0.011833 | 0.000263 | 0.016202 |
| GOOGL | CNN | 0.002094 | 0.057280 | 0.004187 | 0.064708 |
| GOOGL | LSTM | 0.003538 | 0.066024 | 0.007075 | 0.084114 |
| GOOGL | RNN | 0.004559 | 0.070944 | 0.009118 | 0.095486 |
| GOOGL | RandomForest | 0.000491 | 0.027137 | 0.000981 | 0.031328 |
| GOOGL | Transformer | 0.003505 | 0.061703 | 0.007010 | 0.083728 |
| NVDA | ARIMA | 0.000335 | 0.019488 | 0.000669 | 0.025874 |
| NVDA | CNN | 0.037359 | 0.232881 | 0.074717 | 0.273344 |
| NVDA | LSTM | 0.006591 | 0.087193 | 0.013182 | 0.114814 |
| NVDA | RNN | 0.037315 | 0.215479 | 0.074629 | 0.273184 |
| NVDA | RandomForest | 0.001203 | 0.042178 | 0.002407 | 0.049058 |
| NVDA | Transformer | 0.009002 | 0.087683 | 0.018004 | 0.134178 |
| PYPL | ARIMA | 0.000618 | 0.025103 | 0.001236 | 0.035163 |
| PYPL | CNN | 0.032029 | 0.162290 | 0.064058 | 0.253097 |
| PYPL | LSTM | 0.003227 | 0.067139 | 0.006455 | 0.080341 |
| PYPL | RNN | 0.088035 | 0.325262 | 0.176160 | 0.419715 |
| PYPL | RandomForest | 0.001184 | 0.034084 | 0.002369 | 0.048669 |
| PYPL | Transformer | 0.008718 | 0.108786 | 0.017437 | 0.132049 |
| SMCI | ARIMA | 0.000300 | 0.016225 | 0.000600 | 0.024488 |
| SMCI | CNN | 0.005180 | 0.061963 | 0.010360 | 0.101784 |
| SMCI | LSTM | 0.002008 | 0.051037 | 0.004016 | 0.063369 |
| SMCI | RNN | 0.003722 | 0.067456 | 0.007444 | 0.086278 |
| SMCI | RandomForest | 0.001654 | 0.043116 | 0.003307 | 0.057507 |
| SMCI | Transformer | 0.001969 | 0.053435 | 0.003937 | 0.062746 |
| SNAP | ARIMA | 0.000949 | 0.024465 | 0.001897 | 0.043556 |
| SNAP | CNN | 0.024802 | 0.133296 | 0.049605 | 0.222722 |
| SNAP | LSTM | 0.001103 | 0.029171 | 0.002205 | 0.046960 |
| SNAP | RNN | 0.003288 | 0.061114 | 0.006577 | 0.081097 |
| SNAP | RandomForest | 0.002139 | 0.043993 | 0.004277 | 0.065400 |
| SNAP | Transformer | 0.002864 | 0.061151 | 0.005727 | 0.075677 |
| TSLA | ARIMA | 0.000856 | 0.029449 | 0.001712 | 0.041381 |
| TSLA | CNN | 0.102500 | 0.355587 | 0.206653 | 0.454591 |
| TSLA | LSTM | 0.091144 | 0.256797 | 0.182694 | 0.427427 |
| TSLA | RNN | 0.457674 | 0.884034 | 0.948471 | 0.973895 |
| TSLA | RandomForest | 0.002338 | 0.051717 | 0.004676 | 0.068380 |
| TSLA | Transformer | 0.103461 | 0.282166 | 0.207227 | 0.455222 |
| Metric | Transformer Strategy (Net) | SMA10/50 Strategy (Net) | SPY Benchmark |
|---|---|---|---|
| Annualized mean return (%) | 12.13 | 21.00 | 15.76 |
| CAGR (%) | 4.97 | 19.95 | 14.91 |
| Final wealth (×) | 1.20 | 1.98 | 1.68 |
| Sharpe ratio | 0.32 | 0.94 | 0.92 |
| Sortino ratio | 0.38 | 1.30 | 1.29 |
| Maximum drawdown (%) | −65.69 | −27.15 | −24.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, T. A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting. J. Risk Financial Manag. 2026, 19, 203. https://doi.org/10.3390/jrfm19030203
Liu T. A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting. Journal of Risk and Financial Management. 2026; 19(3):203. https://doi.org/10.3390/jrfm19030203
Chicago/Turabian StyleLiu, Ting. 2026. "A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting" Journal of Risk and Financial Management 19, no. 3: 203. https://doi.org/10.3390/jrfm19030203
APA StyleLiu, T. (2026). A Comparative Study of Transformer-Based and Classical Models for Financial Time-Series Forecasting. Journal of Risk and Financial Management, 19(3), 203. https://doi.org/10.3390/jrfm19030203
