GRU-Based Stock Price Forecasting with the Itô-RMSProp Optimizers
Abstract
1. Introduction
- We propose a stochastic differential equation (SDE)-inspired variant of RMSProp, termed Itô-RMSProp, specifically designed for training GRU networks in the context of stock price forecasting.
- We implement the Itô-RMSProp-GRU model and apply it to forecast the stock prices of well-known companies using real-world financial time series data.
- We perform a comprehensive empirical comparison between the proposed Itô-RMSProp-GRU and the classical RMSProp-GRU, showing that Itô-RMSProp improves predictive accuracy and generalization, especially in volatile market conditions.
2. Standard Tools
2.1. RMSProp Optimizer
- Under assumptions of smoothness and bounded gradients, variants of RMSProp converge to stationary points [14].
- RMSProp can be viewed as a special case of adaptive gradient methods (including Adam [12]) with momentum-like effects on the squared gradient accumulation.
- Recent work [15] analyzes RMSProp’s implicit bias and step size adaptation, providing partial guarantees on convergence rates under specific conditions.
- No universal guarantees exist for global optimality in deep learning, due to non-convex loss surfaces.
Mathematical Formulation and Forecast Horizon Definition
2.2. GRU and Loss Function for Time Series Forecasting
- Mean Squared Error (MSE):where N is the number of training sequences, the length of the i-th sequence, and the full set of parameters (GRU weights and output layer weights).
- Mean Absolute Error (MAE):
3. Itô-RMSProp Optimizer to GRUs
3.1. Itô Calculus and the Itô Derivative
3.2. Itô-RMSProp: SDE-Inspired Variant of RMSProp
- is the base learning rate;
- is a small stability constant;
- controls the amplitude of injected noise;
- is a standard Gaussian noise vector sampled independently at each iteration;
- the division by preserves per-parameter adaptive scaling analogous to RMSProp.
- When , Itô-RMSProp reduces exactly to classical RMSProp.
- The adaptive noise scaling ensures per-parameter normalization, preserving RMSProp’s stabilization advantages.
- Practical tuning of is essential, typically starting from small values (e.g., to ) and possibly annealed.
3.3. Connection to Itô SDEs
4. Experimental Setup
4.1. Data Collection and Preprocessing
4.2. Algorithms and Model Configuration
- RMSProp: A widely used optimizer in deep learning, particularly effective for recurrent neural networks.
- Itô-RMSProp: A modified version of RMSProp incorporating stochastic calculus (Itô’s lemma) to enhance adaptability in non-stationary environments.
4.3. Hyperparameter Settings
4.4. RMSProp-GRUs vs. Itô-RMSProp-GRUs
4.5. Relevance of Directional Accuracy and Sharpe Ratio in Financial Time Series Evaluation
4.6. Itô-RMSProp-GRUs Sensitivity
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| RMSProp | Root Mean Square Propagation (optimizer) |
| Itô-RMSProp | SDE-based modification of RMSProp with Itô noise scaling |
| SDE | Stochastic Differential Equation |
| SGLD | Stochastic Gradient Langevin Dynamics |
| GRU | Gated Recurrent Unit |
| LSTM | Long Short-Term Memory network |
| SGD | Stochastic Gradient Descent |
| MAE | Mean Absolute Error |
| RMSE | Root Mean Square Error |
| Coefficient of Determination | |
| CI | Confidence Interval |
| DA | Directional Accuracy (percentage of correctly predicted price movements) |
| SR | Sharpe Ratio (risk-adjusted return measure) |
References
- Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Representations by Back-Propagating Errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed]
- Sundermeyer, M.; Schlüter, R.; Ney, H. LSTM Neural Networks for Language Modeling. In Proceedings of the Interspeech 2012, Portland, OR, USA, 9–13 September 2012; pp. 194–197. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
- Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. In Proceedings of the COMPSTAT 2010, Paris, France, 22–27 August 2010; Kropf, S., Fried, R., Hothorn, T., Eds.; Physica-Verlag: Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar] [CrossRef]
- Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude. Coursera Neural Netw. Mach. Learn. 2012, 4, 26. Available online: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf (accessed on 1 January 2025).
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; Available online: https://arxiv.org/abs/1412.6980 (accessed on 1 January 2025).
- Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the ICLR Workshop, San Juan, Puerto Rico, 2–4 May 2016; Available online: https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ (accessed on 1 January 2025).
- Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; Available online: https://openreview.net/forum?id=ryQu7f-RZ (accessed on 1 January 2025).
- Ward, R.; Wu, X.; Bottou, L. Adagrad Stepsizes: Sharp Convergence over Nonconvex Landscapes. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6676–6685. Available online: https://proceedings.mlr.press/v97/ward19a.html (accessed on 1 January 2025).
- Li, Q.; Tai, C.; E, W. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms. Math. Oper. Res. 2019, 44, 142–172. [Google Scholar] [CrossRef]
- Jin, C.; Ge, R.; Netrapalli, P.; Kakade, S.; Jordan, M.I. How to Escape Saddle Points Efficiently. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 1724–1732. Available online: https://proceedings.mlr.press/v70/jin17a.html (accessed on 1 January 2025).
- Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Graves, A.; Mohamed, A.R.; Hinton, G. Speech Recognition with Deep Recurrent Neural Networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar] [CrossRef]
- Zilly, J.G.; Srivastava, R.K.; Koutník, J.; Schmidhuber, J. Recurrent Highway Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 4189–4198. Available online: https://proceedings.mlr.press/v70/zilly17a.html (accessed on 1 January 2025).
- Zhang, J.; Xu, Q.; Liu, Y.; Lin, D. Highway Long Short-Term Memory RNNs for Distant Speech Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 5755–5759. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. Available online: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (accessed on 1 January 2025).
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 17–19 June 2013; pp. 1310–1318. Available online: https://proceedings.mlr.press/v28/pascanu13.html (accessed on 1 January 2025).
- Sak, H.; Senior, A.; Beaufays, F. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In Proceedings of the Interspeech, Singapore, 14–18 September 2014; pp. 338–342. [Google Scholar]
- Liu, Z.; Chen, Y.; Shen, J.; He, Z.; Wu, C.; Guo, J. Recurrent Neural Networks for Short-Term Traffic Speed Prediction with Missing Data. Transp. Res. Part C Emerg. Technol. 2016, 71, 74–92. [Google Scholar] [CrossRef]
- Fischer, T.; Krauss, C. Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
- Borovykh, A.; Bohte, S.; Oosterlee, C.W. Conditional Time Series Forecasting with Convolutional Neural Networks. arXiv 2017, arXiv:1703.04691. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Hochreiter, S. Untersuchungen zu Dynamischen Neuronalen Netzen. Master’s Thesis, Technische Universität München, Munich, Germany, 1991. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
- Zhou, G.; Cui, Y.; Zhang, C.; Yang, C.; Liu, Z.; Wang, L.; Li, C. Learning Continuous Time Dynamics with Recurrent Neural Networks. arXiv 2016, arXiv:1609.02247. [Google Scholar]
- Jozefowicz, R.; Zaremba, W.; Sutskever, I. An Empirical Exploration of Recurrent Network Architectures. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 2342–2350. [Google Scholar]
- Lei, T.; Zhang, R.; Artzi, Y. Rationalizing Neural Predictions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 107–117. [Google Scholar]
- Lu, Z.; Pu, H.; Wang, F.; Hu, Z.; Wang, L. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016; pp. 4905–4913. [Google Scholar]
- Werbos, P.J. Backpropagation Through Time: What It Does and How to Do It. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef]
- Mandt, S.; Hoffman, M.D.; Blei, D.M. Stochastic Gradient Descent as Approximate Bayesian Inference. J. Mach. Learn. Res. 2017, 18, 1–35. [Google Scholar]
- Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. arXiv 2016, arXiv:1609.04836. [Google Scholar]
- Welling, M.; Teh, Y.W. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; pp. 681–688. Available online: https://www.icml.cc/2011/papers/398_icmlpaper.pdf (accessed on 1 January 2025).
- Raginsky, M.; Rakhlin, A.; Telgarsky, M. Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis. In Proceedings of the 2017 Conference on Learning Theory (COLT), Amsterdam, The Netherlands, 7–10 July 2017; Volume 65, pp. 1674–1703. [Google Scholar]





| Stock | RMSProp | ItôRMSProp | ||||
|---|---|---|---|---|---|---|
| RMSE | MAE | RMSE | MAE | |||
| GOOG | 3.5721 | 3.0222 | 0.9488 | 3.4399 | 2.8278 | 0.9525 |
| AAPL | 2.7511 | 2.0960 | 0.9775 | 2.6491 | 2.0421 | 0.9792 |
| TSLA | 9.1834 | 7.1869 | 0.9702 | 7.8101 | 6.0316 | 0.9784 |
| JPM | 2.3738 | 1.8901 | 0.9742 | 3.4769 | 3.0762 | 0.9447 |
| UNH | 10.9737 | 9.6151 | 0.7776 | 10.9114 | 9.5639 | 0.7801 |
| HD | 5.8224 | 4.7235 | 0.8971 | 7.5543 | 6.5879 | 0.8267 |
| MSFT | 7.9417 | 5.8320 | 0.7434 | 5.5875 | 4.5118 | 0.8730 |
| V | 4.9128 | 3.9675 | 0.9135 | 4.5062 | 3.7154 | 0.9278 |
| Stock | Method | DA (%) | 95% CI (DA) | SR [95% CI] |
|---|---|---|---|---|
| AAPL | RMSProp | 49.82 | [49.10, 50.54] | 1.1970 [1.10, 1.29] |
| Itô-RMSProp | 52.71 | [51.90, 53.52] | 1.0731 [1.00, 1.15] | |
| GOOG | RMSProp | 51.99 | [51.30, 52.68] | 1.2849 [1.20, 1.36] |
| Itô-RMSProp | 52.59 | [52.00, 53.18] | 0.8102 [0.73, 0.89] | |
| MSFT | RMSProp | 47.89 | [47.10, 48.68] | 0.7550 [0.68, 0.83] |
| Itô-RMSProp | 50.06 | [49.30, 50.82] | 0.8577 [0.78, 0.93] | |
| TSLA | RMSProp | 48.62 | [47.90, 49.34] | 0.2276 [0.15, 0.31] |
| Itô-RMSProp | 50.42 | [49.70, 51.14] | 0.6259 [0.54, 0.71] | |
| UNH | RMSProp | 51.14 | [50.50, 51.78] | 0.7834 [0.71, 0.86] |
| Itô-RMSProp | 51.87 | [51.30, 52.44] | 0.8867 [0.81, 0.96] | |
| HD | RMSProp | 48.38 | [47.60, 49.16] | −0.2961 [−0.38, −0.21] |
| Itô-RMSProp | 49.22 | [48.50, 49.94] | 0.2079 [0.12, 0.30] | |
| JPM | RMSProp | 50.55 | [49.80, 51.30] | 0.6720 [0.60, 0.74] |
| Itô-RMSProp | 51.08 | [50.40, 51.76] | 0.7915 [0.71, 0.88] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
El Harrak, M.I.; El Moutaouakil, K.; Ahmed, N.; Abdellatif, E.; Palade, V. GRU-Based Stock Price Forecasting with the Itô-RMSProp Optimizers. AppliedMath 2025, 5, 149. https://doi.org/10.3390/appliedmath5040149
El Harrak MI, El Moutaouakil K, Ahmed N, Abdellatif E, Palade V. GRU-Based Stock Price Forecasting with the Itô-RMSProp Optimizers. AppliedMath. 2025; 5(4):149. https://doi.org/10.3390/appliedmath5040149
Chicago/Turabian StyleEl Harrak, Mohamed Ilyas, Karim El Moutaouakil, Nuino Ahmed, Eddakir Abdellatif, and Vasile Palade. 2025. "GRU-Based Stock Price Forecasting with the Itô-RMSProp Optimizers" AppliedMath 5, no. 4: 149. https://doi.org/10.3390/appliedmath5040149
APA StyleEl Harrak, M. I., El Moutaouakil, K., Ahmed, N., Abdellatif, E., & Palade, V. (2025). GRU-Based Stock Price Forecasting with the Itô-RMSProp Optimizers. AppliedMath, 5(4), 149. https://doi.org/10.3390/appliedmath5040149

