# An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- In this article, we will examine the most recent developments in the field of applying methods of artificial intelligence, particularly those that have emerged in the previous three years.
- We analyze the work performed by Machine Learning, Deep Learning, and Reinforcement Learning in Stock Market Prediction over the last three years.
- We analyze the general process of the stock market based on past research work.
- We will suggest further research direction to be taken in stock market prediction so that new researchers can find a new way to do some research.
- In addition, finally, we will tell from where we can collect the data source for research.

## 2. Related Work

## 3. Overview

## 4. Data Processing and Features

#### Different Types of Data Used in Stock Market Prediction

**(a) Historical Price Market Data (Day Wise)**

**(b) Historical Price Market Tick Data (Minute Wise)**

**(c) Tweets or Comments (Text Data)**

**(d) Image data**

**(e) Fundamental data**

## 5. Prediction Models-Artificial Intelligence in Quantitative Finance

**Algorithmic trading:**Machine learning algorithms are used in algorithmic trading to create trading strategies that can automatically assess massive volumes of financial data and execute trades based on the algorithm’s predictions.

**Risk management:**Machine learning may uncover risky trends in massive financial data sets. This helps financial firms manage risk.

**Portfolio optimization:**Using factors such as volatility, risk, and return, machine learning can be used to find the best investments for a portfolio.

**Forecasting:**By using machine learning, one may analyze past financial data and forecast future market patterns, exchange rates, and other parameters.

**Sentiment analysis:**Machine learning may also be used to look at social media and news to find out what people think about a company, product, or industry and how they feel about it.

**Prediction task using AI in quantitative finance:**Using machine learning, Shares Price Prediction reveals the future worth of business stock and other financial assets traded on an exchange. The purpose of stock price prediction is to generate substantial profits. Predicting the performance of the stock market is a difficult endeavor. Other aspects, such as physical and psychological characteristics and rational and illogical conduct, also influence the forecast. All of these variables contribute to the dynamic and volatile nature of share pricing. This makes it very difficult to accurately estimate stock values.

**Classification task using AI in quantitative finance:**This recommendation model is supposed to categorize a certain stock as a “STRONG BUY” or a “STRONG SELL,” as well as “BUY,” “SELL,” and “HOLD.” The imbalanced natures of the different class labels provide the greatest difficulty in this classification issue. The majority of the time, the user will be required to HOLD the stocks, and only a very small percentage of the time will a STRONG BUY or STRONG SELL Signal be present.

**Portfolio construction using AI in quantitative finance:**An investor’s portfolio consists of diverse assets including stocks, bonds, and cash that are allocated based on many considerations, including investment risk, projected return, and liquidity needs. The goal is to obtain an estimated return with as little risk as possible. The portfolio was chosen based on how well the assets would do in the future, as evaluated by machine learning methods. Table 7 shows the list of the standard machine learning prediction models that were used. Table 8 shows standard deep-learning prediction models that were used. Table 9 shows standard reinforcement learning models that were used and Table 10 shows hybrid prediction models that were used.

#### 5.1. Machine Learning

#### 5.1.1. Supervised Learning

- (a)
**Regression:**is determined by the variable that represents the output. If the output variable is continuous, then the job in question is referred to as a regression task. Predicting the price of a home and the price of a stock are both instances of regression problems.- (b)
**Classification:**Classification tasks use categorical variables such as color and form. Most machine learning applications employ supervised learning. Supervised learning methods include logistic regression, linear regression, SVMs, and random forests.

#### 5.1.2. Unsupervised Learning

#### 5.2. Deep Learning

#### 5.2.1. Artificial Neural Network

**Advantages of using ANN**[85]

- Outstanding capability in dealing with complicated nonlinear patterns.
- Extremely precise group-data modeling. The model is adaptable to both linear and non-linear dynamics.
- Ability to accommodate missing and noisy data without breaking the model.

**Disadvantages of using ANN [85]**

- Overfitting.
- ANNs just provide projected target values for some unknown data without any variance information to evaluate the accuracy of the prediction. These models are sensitive to parameter selection.

#### 5.2.2. Recurrent Neural Network (RNN)

**Advantages of using RNN**[85]

- When it comes to illustrating the temporal connections that exist between the inputs and outputs of the neural network, this tool proves to be beneficial.

**Disadvantages of using RNN**[85]

- Difficult to train and instruct properly.

#### 5.2.3. Long Short-Term Memory (LSTM)

**Advantages of using LSTM**[85]

- Capable of self-learning data interactions and patterns.
- Analyses data interactions and hidden patterns to make effective predictions.
- Capable of retaining knowledge for an extended period of time.

**Disadvantages of using LSTM**[85]

#### 5.2.4. Convolutional Neural Networks (CNNs)

**Advantages of using CNN**[64]

- Compared to conventional classification algorithms, they have lower pre-processing overhead and can teach themselves new filters and attributes.
- Another important benefit offered by CNNs is the distribution of weight.

**Disadvantages of using CNN**[64]

- For the CNN to function properly, a significant amount of training data are required.
- Because of procedures such as maxpool, CNNs often have substantially higher latency.

#### 5.3. Reinforcement Learning

- Environment: The external environment in which the agent has interactions.
- State: Current circumstances involving the agent.
- Reward: Feedback signal in the form of numbers from the environment.
- Policy: A method for mapping the state of the agent to its actions. A policy is what determines the course of action to take in any particular condition.
- Value: The future reward, that an agent would be eligible to earn if they act while in a certain condition.

#### 5.3.1. Q-Learning

#### 5.3.2. Deep Q-Learning (Q-Learning with Neural Networks)

_{t}, and the target network (Neural Network) calculates the Q-value for state S

_{t+1}(next state) to stabilize training and stop abrupt Q-value count increases by duplicating it as training data on each iterated Q-value of the Q-network [4,9,23,32,35,39,41,42,43,48,55,56,58,60,67,70,71]. Table 12 shows the advantages and disadvantages of using reinforcement learning in quantitative finance. Figure 10 shows the basic structure of deep Q-Learning. Deep Q-learning is a technique that is used to train artificial intelligence agents to function in settings that have distinct action spaces.

## 6. Evaluation Metrics

**True Positive (TP):**YES, as predicted by the model, and YES, as measured by the actual value.**True Negative (TN):**The model predicted NO, and the actual value was also NO.**False Positive (FP):**YES was predicted by the model, but the actual value was NO.**False Negative (FN):**When the model predicted NO but the actual value was YES.

#### 6.1. Learning Model Evaluation Metrics

Learning Model Evaluation Metrics | Description | Formula | Article |
---|---|---|---|

Mean Absolute Error (MAE) | The average absolute difference between the values that are fitted by the model and the historical data that have been observed. | $\mathrm{MAE}=\frac{1}{\mathrm{N}}{\displaystyle {\displaystyle \sum}_{\mathrm{i}=1}^{\mathrm{N}}}\left|{\mathrm{Y}}_{\mathrm{i}}-{\widehat{\mathrm{Y}}}_{i}\right|$ | [10,27,29,57] |

Mean Squared Error (MSE) | The sum of squared differences between model-fitted values and observed values divided by the number of historical points minus the model’s parameters. | $\mathrm{MSE}=\frac{1}{\mathrm{N}}{\displaystyle {\displaystyle \sum}_{\mathrm{i}=1}^{\mathrm{N}}}{\left({\mathrm{Y}}_{\mathrm{i}}-{\widehat{\mathrm{Y}}}_{i}\right)}^{2}$ | [10,27,57,79] |

Root Mean Squared Error (RMSE) | The root square of the mean square error. It uses the same scale as the values that were seen in the data. | $\mathrm{RMSE}=\sqrt{{\displaystyle {\displaystyle \sum}_{\mathrm{i}=1}^{\mathrm{N}}}\frac{{\left({\mathrm{Y}}_{\mathrm{i}}-{\widehat{\mathrm{Y}}}_{i}\right)}^{2}}{\mathrm{N}}}$ | [27,28,29,80] |

Mean Absolute Percent Error (MAPE) | The average absolute percentage disparity between the values predicted by the model and the values actually found in the data. | $\mathrm{MAPE}=\frac{1}{\mathrm{N}}{\displaystyle {\displaystyle \sum}_{\mathrm{i}=1}^{\mathrm{N}}}\left|\frac{{\mathrm{Y}}_{\mathrm{i}}-{\widehat{\mathrm{Y}}}_{i}}{{\mathrm{Y}}_{\mathrm{i}}}\right|$ | [10,27,57] |

Classification Accuracy | It determines the frequency with which the model correctly predicts the output. | $\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$ | [30,37,49,57,66,68] |

Misclassification rate | It determines the frequency with which the model correctly predicts the output. | $\mathrm{Error}\mathrm{rate}=\frac{\mathrm{FP}+\mathrm{FN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$ | |

Precision | Precision is also known as Positive Predictive Value. Precision is the proportion of correct positive predictions to total positive predictions. | $\mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$ | [30,37,49,57,66,68] |

Recall | It is defined as the positive classes that our model accurately predicted out of a total of all positive classes. | $\mathrm{Recall}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$ | [30,37,49,57,66,68] |

F1-Score | It is difficult to compare two models if one has low precision and the other has a high recall. We can use F-score for this. If the recall equals the precision, the F-score is maximum | $\mathrm{F}1\mathrm{Score}=2\times \frac{\mathrm{Precision}\times \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}$ | [30,37,49,57,66,68] |

AUC | AUC is the probability that a random positive case comes after a random negative example. AUC varies from 0 to 1. A model whose forecasts are 100% inaccurate has an AUC of 0.0; one whose predictions are 100% right has an AUC of 1.0. | $\mathrm{AUC}=\frac{\mathrm{Recall}+\mathrm{Sensibility}}{2}$ | [37] |

#### 6.2. Portfolio Evaluation Metrics

Portfolio Evaluation Metrics | Description | Article |
---|---|---|

Accumulated Return | Accumulated Return refers to the percentage increase or decrease in value over the life of an investment. The Accumulated Return value ought to be positive and as high as feasible. | [9,40,58,86] |

Average daily return | The term “average daily return” refers to the mathematical mean of a set of returns made over time | [40,58,86] |

Maximum Drawdown | Maximum drawdown is a risk indicator for a portfolio chosen based on a certain strategy. It calculates the largest single decline in a portfolio’s value from its peak to its bottom. | [9,40,58,86] |

Skewness | Skewness is a measure of the distribution’s symmetry or asymmetry. Skewness is 0 in the perfectly symmetric distribution. If Skewness is less than or more than one, the data are skewed. | [40,56,58,87] |

Kurtosis | Kurtosis is a measure of how much a variable’s value fluctuates below or above the mean. A Kurtosis score greater than 3 implies a wide variance around the mean. | [40,56,58,87] |

Standard Deviation | Investors use standard deviation to calculate the volatility of a stock’s performance. The greater the number represented by the standard deviation, the more volatile the stock. | [40,58,87] |

Sharpe ratio | Portfolio risk-return analysis uses a Sharpe ratio. The strategy with the highest Sharpe ratio has the lowest risk. | [9,40,58,86] |

## 7. Data Availability and Implementation

## 8. Challenges and Future Research Direction

#### 8.1. Multi-Agent Advanced DRL Techniques on Quantitative Finance

#### 8.2. Fresh Configurations for Quantitative Trading

#### 8.3. Towards a More Accurate Simulation of the Market

#### 8.4. Improvements with Auto-ML Approaches

## 9. Discussion

## 10. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

A/D | The Accumulation/Distribution Indicator |

ACC | Accuracy |

ACF | Auto Correlation Function |

ADA | Ada Boosted Decision Trees |

AdaBoost | Adaptive Boosting |

ADR | Average Daily Return |

ADX | Average Directional Movement Index |

AE | Auto Encoder |

AI | Artificial Intelligence |

ANN | Artificial Neural Network |

ARIMA | Auto-Regressive Integrated Moving Average |

ARMA | Auto Regressive Moving Average |

ATR | Average True Range |

AUC | Area Under the (ROC) Curve |

BBANDS | Bollinger Bands |

BDT | Boosted Decision Tree |

BERT | Bidirectional Encoder Representation from Transformers |

BILSTM | Bidirectional Long Short-Term Memory |

BN | Bayesian Network |

BNN | Bayesian Neural Network |

BP | Back Propagation |

BRNN | Bidirectional Recurrent Neural Network |

CART | Classification And Regression Tree |

CCI | Commodity Channel Index |

CMMs | Conditional Markov Model |

CNN | Convolutional Neural Network |

ConvNet | Convolutional Neural Network |

CRNN | Convolutional Recurrent Neural Network |

DANN | Dynamic Artificial Neural Network |

DBN | Deep Belief Network |

DEMA | Double Exponential Moving Average |

DL | Deep Learning |

DNN | Deep Neural Network |

DQN | Deep Q-Network |

DT | Decision Tree |

ELM | Extreme Learning Machine |

EMA | Exponential Moving Average |

FA | Fundamental Analysis |

FC-CNN | Fully Convolutional Neural Network |

FC-LSTM | Fully Connected Long Short-Term Memory |

FCM | Fuzzy C-Means |

FCN | Fully Convolutional Network |

FN | False Negative |

FNN | Feedforward Neural Network |

FNR | False Negative Rate |

FPR | False Positive Rate |

FP | False Positive |

GA | Genetic Algorithm |

GAN | Generative Adversarial Network |

GD | Gradient Descent |

GRU | Gated Recurrent Unit |

HMM | Hidden Markov Model |

HNN | Hybrid Neural Network |

k-NN | k-Nearest Neighbor |

LDA | Linear Discriminant Analysis |

LR | Logistic Regression |

LSTM | Long Short-Term Memory |

MACD | Moving Average Convergence/Divergence |

MAE | Mean Absolute Error |

MAPE | Mean Absolute Prediction Error |

MCMC | Markov Chain Monte Carlo |

MD | Maximum Drawdown |

MDP | Markov Decision Process |

MDRNN | Multidimensional recurrent neural network |

MIS | Management Information System |

ML | Machine Learning |

MLP | Multi-Layer Perceptron |

MOM | Momentum |

MSE | Mean Squared Error |

NB | Naïve Bayes |

NLP | Natural Language Processing |

NLT | Neural Machine Translation |

NN | Neural Network |

NSE | National Stock Exchange |

OBV | On Balance Volume |

PCA | Principal Component Analysis |

QA | Quantitative Analysis |

QF | Quantitative Finance |

QT | Quantitative Trading |

RBF | Radial Basis Function |

ReLU | Rectified Linear Unit |

RF | Random Forest |

RL | Reinforcement Learning |

RMSE | Root Mean Squared Error |

RNN | Recurrent Neural Network |

ROC | Received Operating Characteristic |

RSI | Relative Strength Index |

RTRL | Real-Time Recurrent Learning |

SGBoost | Stochastic Gradient Boosting |

SGD | Stochastic Gradient Descent |

SLP | Single-Layer Perceptron |

SMA | Simple Moving Average |

STDDEV | Standard Deviation |

STOCH | Stochastic |

SVM | Support Vector Machine |

SVR | Support Vector Regression |

TA | Technical Analysis |

TEMA | Triple Exponential Moving Average |

TI | Technical Indicators |

TNR | True Negative Rate |

TN | True Negative |

TP | True Positive |

TPR | True Positive Rate |

TSF | Time Series Forecast |

VAR | Variance |

WILLR | William’s % R |

WMA | Weighted Moving Average |

XGBoost | eXtreme Gradient Boosting |

Y_{i} | ith actual value |

${\widehat{\mathrm{Y}}}_{i}$ | ith predicted value |

## References

- Jiang, W. Applications of deep learning in stock market prediction: Recent progress. Expert Syst. Appl.
**2021**, 184, 115537. [Google Scholar] [CrossRef] - Alameer, A.; Saleh, H.; Alshehri, K. Reinforcement Learning in Quantitative Trading: A Survey. TechRxiv
**2022**. [Google Scholar] [CrossRef] - Wang, Y.; Yan, G. Survey on the application of deep learning in algorithmic trading. Data Sci. Financ. Econ.
**2021**, 1, 345–361. [Google Scholar] [CrossRef] - Millea, A. Deep reinforcement learning for trading—A critical survey. Data
**2021**, 6, 119. [Google Scholar] [CrossRef] - Kumar, G.; Jain, S.; Singh, U.P. Stock Market Forecasting Using Computational Intelligence: A Survey; Springer: Dordrecht, The Netherlands, 2021; Volume 28, pp. 1069–1101. [Google Scholar]
- Pricope, T.-V. Deep Reinforcement Learning in Quantitative Algorithmic Trading: A Review. arXiv
**2021**, arXiv:2106.00123. [Google Scholar] [CrossRef] - Rouf, N.; Malik, M.B.; Arif, T.; Sharma, S.; Singh, S.; Aich, S.; Kim, H.-C. Stock market prediction using machine learning techniques: A decade survey on methodologies, recent developments, and future directions. Electronics
**2021**, 10, 2717. [Google Scholar] [CrossRef] - Deng, Y.; Bao, F.; Kong, Y.; Ren, Z.; Dai, Q. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading. IEEE Trans. Neural Networks Learn. Syst.
**2017**, 28, 653–664. [Google Scholar] [CrossRef] - Yu, X.; Wu, W.; Liao, X.; Han, Y. Dynamic stock-decision ensemble strategy based on deep reinforcement learning. Appl. Intell.
**2023**, 53, 2452–2470. [Google Scholar] [CrossRef] - Lv, P.; Wu, Q.; Xu, J.; Shu, Y. Stock Index Prediction Based on Time Series Decomposition and Hybrid Model. Entropy
**2022**, 24, 146. [Google Scholar] [CrossRef] [PubMed] - García-Medina, A.; Huynh, T.L.D. What drives bitcoin? An approach from continuous local transfer entropy and deep learning classification models. Entropy
**2021**, 23, 1582. [Google Scholar] [CrossRef] - Zhao, Y.; Chen, Z. Forecasting stock price movement: New evidence from a novel hybrid deep learning model. J. Asian Bus. Econ. Stud.
**2022**, 29, 91–104. [Google Scholar] [CrossRef] - Abdullah, M. The implication of machine learning for financial solvency prediction: An empirical analysis on public listed companies of Bangladesh. J. Asian Bus. Econ. Stud.
**2021**, 28, 303–320. [Google Scholar] [CrossRef] - Bilgili, F.; Koçak, E.; Kuşkaya, S. Dynamics and Co-movements between the COVID-19 Outbreak and the Stock Market in Latin American Countries: An Evaluation Based on the Wavelet-Partial Wavelet Coherence Model. Eval. Rev.
**2022**. Online ahead of print. [Google Scholar] [CrossRef] [PubMed] - Spooner, T.; Savani, R. Robust market making via adversarial reinforcement learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; pp. 4590–4596. [Google Scholar] [CrossRef]
- Karpe, M.; Fang, J.; Ma, Z.; Wang, C. Multi-agent reinforcement learning in a realistic limit order book market simulation. In Proceedings of the ICAIF 2020-1st ACM International Conference on AI in Finance, New York, NY, USA, 15–16 October 2020. [Google Scholar] [CrossRef]
- Vyetrenko, S.; Xu, S. Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response. arXiv
**2019**, arXiv:1906.02312. [Google Scholar] [CrossRef] - Yildiz, Z.C.; Yildiz, S.B. A portfolio construction framework using LSTM-based stock markets forecasting. Int. J. Financ. Econ.
**2022**, 27, 2356–2366. [Google Scholar] [CrossRef] - Khan, W.; Ghazanfar, M.A.; Azam, M.A.; Karami, A.; Alyoubi, K.H.; Alfakeeh, A.S. Stock market prediction using machine learning classifiers and social media, news. J. Ambient Intell. Humaniz. Comput.
**2022**, 13, 3433–3456. [Google Scholar] [CrossRef] - Wang, J.; Zhuang, Z.; Feng, L. Intelligent Optimization Based Multi-Factor Deep Learning Stock Selection Model and Quantitative Trading Strategy. Mathematics
**2022**, 10, 566. [Google Scholar] [CrossRef] - Xu, H.; Chai, L.; Luo, Z.; Li, S. Stock movement prediction via gated recurrent unit network based on reinforcement learning with incorporated attention mechanisms. Neurocomputing
**2022**, 467, 214–228. [Google Scholar] [CrossRef] - Brim, A.; Flann, N.S. Deep reinforcement learning stock market trading, utilizing a CNN with candlestick images. PLoS ONE
**2022**, 17, e0263181. [Google Scholar] [CrossRef] - Li, Y.; Liu, P.; Wang, Z. Stock Trading Strategies Based on Deep Reinforcement Learning. Sci. Program.
**2022**, 2022, 4698656. [Google Scholar] [CrossRef] - Yao, J.; Li, Z.; Cui, T.; Xi, H. Quantitative Investment Trading Model Based on Model Recognition Strategy with Deep Learning Method. Wirel. Commun. Mob. Comput.
**2022**, 2022, 8856215. [Google Scholar] [CrossRef] - Saini, A.; Sharma, A. Predicting the Unpredictable: An Application of Machine Learning Algorithms in Indian Stock Market. Ann. Data Sci.
**2022**, 9, 791–799. [Google Scholar] [CrossRef] - Zou, Z.; Qu, Z.; Using LSTM in Stock prediction and Quantitative Trading. CS230 Deep. Learn. Winter
**2022**, 1–6. Available online: https://cs230.stanford.edu/projects_winter_2020/reports/32066186.pdf (accessed on 15 November 2022). - Wysocki, M.; Ślepaczuk, R. Artificial neural networks performance in WIG20 index options pricing. Entropy
**2021**, 24, 35. [Google Scholar] [CrossRef] [PubMed] - Raubitzek, S.; Neubauer, T. An Exploratory Study on the Complexity and Machine Learning Predictability of Stock Market Data. Entropy
**2022**, 24, 332. [Google Scholar] [CrossRef] [PubMed] - Sako, K.; Mpinda, B.N.; Rodrigues, P.C. Neural Networks for Financial Time Series Forecasting. Entropy
**2022**, 24, 657. [Google Scholar] [CrossRef] - Ghosh, P.; Neufeld, A.; Sahoo, J.K. Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. Financ. Res. Lett.
**2022**, 46, 102280. [Google Scholar] [CrossRef] - Zhang, W.; Yin, T.; Zhao, Y.; Han, B.; Liu, H. Reinforcement Learning for Stock Prediction and High-Frequency Trading with T+1 Rules. IEEE Access
**2022**, 1. [Google Scholar] [CrossRef] - Park, D.-Y.; Lee, K.-H. Practical Algorithmic Trading Using State Representation Learning and Imitative Reinforcement Learning. IEEE Access
**2021**, 9, 152310–152321. [Google Scholar] [CrossRef] - Ayala, J.; García-Torres, M.; Noguera, J.L.V.; Gómez-Vela, F.; Divina, F. Technical analysis strategy optimization using a machine learning approach in stock market indices. Knowledge-Based Syst.
**2021**, 225, 107119. [Google Scholar] [CrossRef] - Ma, C.; Zhang, J.; Liu, J.; Ji, L.; Gao, F. A parallel multi-module deep reinforcement learning algorithm for stock trading. Neurocomputing
**2021**, 449, 290–302. [Google Scholar] [CrossRef] - AbdelKawy, R.; Abdelmoez, W.M.; Shoukry, A. A synchronous deep reinforcement learning model for automated multi-stock trading. Prog. Artif. Intell.
**2021**, 10, 83–97. [Google Scholar] [CrossRef] - Carta, S.; Corriga, A.; Ferreira, A.; Podda, A.S.; Recupero, D.R. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl. Intell.
**2021**, 51, 889–905. [Google Scholar] [CrossRef] - Wu, D.; Wang, X.; Wu, S. A hybrid method based on extreme learning machine and wavelet transform denoising for stock prediction. Entropy
**2021**, 23, 440. [Google Scholar] [CrossRef] [PubMed] - Chang, V.; Man, X.; Xu, Q.; Hsu, C. Pairs trading on different portfolios based on machine learning. Expert Syst.
**2021**, 38, e12649. [Google Scholar] [CrossRef] - Théate, T.; Ernst, D. An application of deep reinforcement learning to algorithmic trading. Expert Syst. Appl.
**2021**, 173, 114632. [Google Scholar] [CrossRef] - Chakole, J.B.; Kolhe, M.S.; Mahapurush, G.D.; Yadav, A.; Kurhekar, M.P. A Q-learning agent for automated trading in equity stock markets. Expert Syst. Appl.
**2021**, 163, 113761. [Google Scholar] [CrossRef] - Li, Z.; Liu, X.-Y.; Zheng, J.; Wang, Z.; Walid, A.; Guo, J. FinRL-Podracer: High Performance and Scalable Deep Reinforcement Learning for Quantitative Finance. In Proceedings of the ICAIF 2021-2nd ACM International Conference on AI in Finance, Virtual, 3–5 November 2021. [Google Scholar] [CrossRef]
- Liu, X.Y.; Yang, H.; Gao, J.; Wang, C.D. FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
- Liu, Y.; Liu, Q.; Zhao, H.; Pan, Z.; Liu, C. Adaptive quantitative trading: An imitative deep reinforcement learning approach. In Proceedings of the AAAI 2020-34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 2128–2135. [Google Scholar] [CrossRef]
- Dang, Q.V. Reinforcement Learning in Stock Trading; 1121 AISC; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Nabipour, M.; Nayyeri, P.; Jabani, H.; Shahab, S.; Mosavi, A. Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; A Comparative Analysis. IEEE Access
**2020**, 8, 150199–150212. [Google Scholar] [CrossRef] - Wu, X.; Chen, H.; Wang, J.; Troiano, L.; Loia, V.; Fujita, H. Adaptive stock trading strategies with deep reinforcement learning methods. Inf. Sci.
**2020**, 538, 142–158. [Google Scholar] [CrossRef] - Naik, N.; Mohan, B.R. Intraday Stock Prediction Based on Deep Neural Network. Natl. Acad. Sci. Lett.
**2020**, 43, 241–246. [Google Scholar] [CrossRef] - Conegundes, L.; Pereira, A.C.M. Beating the Stock Market with a Deep Reinforcement Learning Day Trading System. In Proceedings of the 020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
- Khan, W.; Malik, U.; Ghazanfar, M.A.; Azam, M.A.; Alyoubi, K.H.; Alfakeeh, A.S. Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput.
**2020**, 24, 11019–11043. [Google Scholar] [CrossRef] - Chen, Y.; Hao, Y. A novel framework for stock trading signals forecasting. Soft Comput.
**2020**, 24, 12111–12130. [Google Scholar] [CrossRef] - Parray, I.R.; Khurana, S.S.; Kumar, M.; Altalbe, A.A. Time series data analysis of stock price movement using machine learning techniques. Soft Comput.
**2020**, 24, 16509–16517. [Google Scholar] [CrossRef] - Ananthi, M.; Vijayakumar, K. Stock market analysis using candlestick regression and market trend prediction (CKRM). J. Ambient. Intell. Humaniz. Comput.
**2020**, 12, 4819–4826. [Google Scholar] [CrossRef] - Zhang, Z.; Zohren, S.; Roberts, S.J. Deep Reinforcement Learning for Trading. J. Financ. Data Sci.
**2020**, 2, 25–40. [Google Scholar] [CrossRef] - Ta, V.-D.; Liu, C.-M.; Tadesse, D.A. Portfolio optimization-based stock prediction using long-short term memory network in quantitative trading. Appl. Sci.
**2020**, 10, 437. [Google Scholar] [CrossRef] [Green Version] - Li, Y.; Ni, P.; Chang, V. Application of deep reinforcement learning in stock trading strategies and stock forecasting. Computing
**2020**, 102, 1305–1322. [Google Scholar] [CrossRef] - Yuan, Y.; Wen, W.; Yang, J. Using data augmentation based reinforcement learning for daily stock trading. Electronics
**2020**, 9, 1384. [Google Scholar] [CrossRef] - Nabipour, M.; Nayyeri, P.; Jabani, H.; Mosavi, A.; Salwana, E. Deep learning for stock market prediction. Entropy
**2020**, 22, 840. [Google Scholar] [CrossRef] [PubMed] - Chakole, J.; Kurhekar, M. Trend following deep Q-Learning strategy for stock trading. Expert Syst.
**2020**, 37, e12514. [Google Scholar] [CrossRef] - Park, H.; Sim, M.K.; Choi, D.G. An intelligent financial portfolio trading strategy using deep Q-learning. Expert Syst. Appl.
**2020**, 158, 113573. [Google Scholar] [CrossRef] - Yang, H.; Liu, X.-Y.; Zhong, S.; Walid, A. Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the ICAIF 2020-1st ACM International Conference on AI in Finance, New York, NY, USA; 2020. [Google Scholar] [CrossRef]
- Yuan, X.; Yuan, J.; Jiang, T.; Ain, Q.U. Integrated Long-Term Stock Selection Models Based on Feature Selection and Machine Learning Algorithms for China Stock Market. IEEE Access
**2020**, 8, 22672–22685. [Google Scholar] [CrossRef] - Li, Y.; Ni, P.; Chang, V. An Empirical Research on the Investment Strategy of Stock Market based on Deep Reinforcement Learning model. In Proceedings of the COMPLEXIS 2019-4th International Conference on Complexity, Future Information Systems and Risk, Crete, Greece, 2–4 May 2019; pp. 52–58. [Google Scholar] [CrossRef]
- Meng, T.L.; Khushi, M. Reinforcement learning in financial markets. Data
**2019**, 4, 110. [Google Scholar] [CrossRef] - Hoseinzade, E.; Haratizadeh, S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl.
**2019**, 129, 273–285. [Google Scholar] [CrossRef] - Selvamuthu, D.; Kumar, V.; Mishra, A. Indian stock market prediction using artificial neural networks on tick data. Financ. Innov.
**2019**, 5, 16. [Google Scholar] [CrossRef] - Tan, Z.; Yan, Z.; Zhu, G. Stock selection with random forest: An exploitation of excess return in the Chinese stock market. Heliyon
**2019**, 5, e02310. [Google Scholar] [CrossRef] - Li, Y.; Zheng, W.; Zheng, Z. Deep Robust Reinforcement Learning for Practical Algorithmic Trading. IEEE Access
**2019**, 7, 108014–108021. [Google Scholar] [CrossRef] - Lv, D.; Yuan, S.; Li, M.; Xiang, Y. An Empirical Study of Machine Learning Algorithms for Stock Daily Trading Strategy. Math. Probl. Eng.
**2019**, 2019, 7816154. [Google Scholar] [CrossRef] - Wang, Q.; Xu, W.; Huang, X.; Yang, K. Enhancing intraday stock price manipulation detection by leveraging recurrent neural networks with ensemble learning. Neurocomputing
**2019**, 347, 46–58. [Google Scholar] [CrossRef] - YHu, Y.-J.; Lin, S.-J. Deep Reinforcement Learning for Optimizing Finance Portfolio Management. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 14–20. [Google Scholar] [CrossRef]
- Wu, J.; Wang, C.; Xiong, L.; Sun, H. Quantitative Trading on Stock Market Based on Deep Reinforcement Learning. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14-19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Pendharkar, P.C.; Cusatis, P. Trading financial indices with reinforcement learning agents. Expert Syst. Appl.
**2018**, 103, 1–13. [Google Scholar] [CrossRef] - Chong, E.; Han, C.; Park, F.C. Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Syst. Appl.
**2017**, 83, 187–205. [Google Scholar] [CrossRef] - Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques. Expert Syst. Appl.
**2015**, 42, 259–268. [Google Scholar] [CrossRef] - Guresen, E.; Kayakutlu, G.; Daim, T.U. Using artificial neural network models in stock market index prediction. Expert Syst. Appl.
**2011**, 38, 10389–10397. [Google Scholar] [CrossRef] - Kara, Y.; Boyacioglu, M.A.; Baykan, K. Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange. Expert Syst. Appl.
**2011**, 38, 5311–5319. [Google Scholar] [CrossRef] - Zhong, X.; Enke, D. Predicting the daily return direction of the stock market using hybrid machine learning algorithms. Financ. Innov.
**2019**, 5, 4. [Google Scholar] [CrossRef] - Shah, D.; Isah, H.; Zulkernine, F. Stock market analysis: A review and taxonomy of prediction techniques. Int. J. Financ. Stud.
**2019**, 7, 26. [Google Scholar] [CrossRef] - Bajpai, S. Application of deep reinforcement learning for Indian stock trading automation. arXiv
**2021**, arXiv:2106.16088. [Google Scholar] [CrossRef] - Xu, C.; Ke, J.; Peng, Z.; Fang, W.; Duan, Y. Asymmetric Fractal Characteristics and Market Efficiency Analysis of Style Stock Indices. Entropy
**2022**, 24, 969. [Google Scholar] [CrossRef] - Mendoza-Urdiales, R.A.; Núñez-Mora, J.A.; Santillán-Salgado, R.J.; Valencia-Herrera, H. Twitter Sentiment Analysis and Influence on Stock Performance Using Transfer Entropy and EGARCH Methods. Entropy
**2022**, 24, 874. [Google Scholar] [CrossRef] - Li, Q.; Chen, Y.; Wang, J.; Chen, Y.; Chen, H. Web Media and Stock Markets: A Survey and Future Directions from a Big Data Perspective. IEEE Trans. Knowl. Data Eng.
**2017**, 30, 381–399. [Google Scholar] [CrossRef] - Vanstone, B.; Finnie, G. An empirical methodology for developing stockmarket trading systems using artificial neural networks. Expert Syst. Appl.
**2009**, 36, 6668–6680. [Google Scholar] [CrossRef] - Soni, P.; Tewari, Y.; Krishnan, D. Machine Learning Approaches in Stock Price Prediction: A Systematic Review. J. Phys. Conf. Ser.
**2022**, 2161, 012065. [Google Scholar] [CrossRef] - Obthong, M.; Tantisantiwong, N.; Jeamwatthanachai, W.; Wills, G. A survey on machine learning for stock price prediction: Algorithms and techniques. In Proceedings of the 2nd International Conference on Finance, Economics, Management and IT Business, Vienna House Diplomat Prague, Prague, Czech Republic, 5–6 May 2020; pp. 63–71. [Google Scholar] [CrossRef]
- Liu, X.-Y.; Rui, J.; Gao, J.; Yang, L.; Yang, H.; Wang, Z.; Wang, C.; Guo, J. FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance. arXiv
**2021**, arXiv:2112.06753. [Google Scholar] [CrossRef] - Nie, C.-X.; Xiao, J. Dynamics of Information Flow between the Chinese A-Share Market and the U.S. Stock Market: From the 2008 Crisis to the COVID-19 Pandemic Period. Entropy
**2022**, 24, 1102. [Google Scholar] [CrossRef] - Rundo, F.; Trenta, F.; Di Stallo, A.L.; Battiato, S. Machine learning for quantitative finance applications: A survey. Appl. Sci.
**2019**, 9, 5574. [Google Scholar] [CrossRef] [Green Version]

**Figure 3.**Different types of Stock Market Data, Machine Learning Tasks, Prediction Models, Model Performance Evaluation Metrics, and portfolio performance evaluation metrics.

**Figure 4.**A candlestick pattern is a price chart pattern used in technical analysis of the financial markets.

**Figure 5.**Relationship of AI, ML, DL, RL, and DRL. Machine learning, deep learning, and reinforcement learning are subsets of artificial intelligence.

**Figure 6.**Classification, Regression, and Clustering task performed by the machine learning algorithm.

**Figure 7.**The basic structure of an artificial neural network consists input layer, a hidden layer, and an output layer.

**Figure 9.**Q-Learning—Simple Reinforcement Learning method that employs Q-values (action values) to enhance the learning agent’s behavior.

Year | Count | Article |
---|---|---|

2022 | 17 | [2,9,10,18,19,20,21,22,23,24,25,26,27,28,29,30,31] |

2021 | 14 | [1,4,7,32,33,34,35,36,37,38,39,40,41,42] |

2020 | 16 | [43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] |

2019 | 9 | [62,63,64,65,66,67,68,69,70,71] |

2018 | 1 | [72] |

2017 | 2 | [8,73] |

Others | 5+ | [74,75,76] |

Index | Country |
---|---|

S&P 500 | US |

Dow Jones Industrial Average | US |

CSI Composite | China |

HIS | Hong Kong |

NIFTY50 | India |

BSE30 | India |

FTSE 100 | England |

DAX | Germany |

NASDAQ Composite | US |

NYSE Composite | US |

Euronext 100 (N100) | Europe |

Deutsche Boese DAX index (GDAXI) | Europe |

SSE Composite index (000001.SS) | Asia |

Nikkei 225 (N225) | Asia, Japan |

Global X MSCI Nigeria ETF (NGE) and | Nigeria |

FTSE/JSE Africa index (J580.JO) | Africa |

Journal/Publisher | Count | Article |
---|---|---|

MDPI | 12 | [4,7,10,20,27,28,29,37,54,56,57,63] |

Expert System with Application | 10 | [1,39,40,59,64,72,73,74,75,76] |

Expert Systems | 2 | [38,58] |

IEEE Access | 5 | [31,32,45,61,67] |

neurocomputing | 3 | [21,34,69] |

Applied Intelligence | 2 | [9,36] |

IEEE Transactions | 1 | [8] |

Information Sciences | 1 | [46] |

Soft Computing | 3 | [49,50,51] |

PLoS ONE | 1 | [22] |

Conference Papers | 3 | [48,70,71] |

Others | 19+ | [3,23,24,25,26,30,33,35,44,47,53,65,66,68,77,78] |

Date | Prev. Close | Open | High | Low | Close | Volume |
---|---|---|---|---|---|---|

1 December 2014 | 533.5 | 539.85 | 539.85 | 530.1 | 536 | 5093458 |

2 December 2014 | 536 | 535.7 | 535.8 | 528.1 | 528.95 | 3555543 |

3 December 2014 | 528.95 | 535.2 | 538.8 | 526.3 | 529 | 3997174 |

4 December 2014 | 529 | 534 | 537.25 | 522 | 527.75 | 3600835 |

…… | …… | …… | …… | …… | …… | …… |

Stock Symbol | Time | Open | High | Low | Close | Volume |
---|---|---|---|---|---|---|

SBIN | 09:08 | 492.65 | 492.65 | 492.65 | 492.65 | 53244 |

SBIN | 09:16 | 492.8 | 494.6 | 491.85 | 494 | 159589 |

SBIN | 09:17 | 493.9 | 494.15 | 493.75 | 494.15 | 162984 |

SBIN | 09:18 | 494.3 | 495 | 493.5 | 494 | 123425 |

…… | …… | …… | …… | …… | …… | …… |

**Table 6.**Input parameters for the various machine learning models that are being employed and their associated articles.

Features | Article |
---|---|

Historical Prices Data | [9,10,26,27,28,36,43,54,79,80] |

Technical Indicator and Historical Price Data | [6,29,40,55,56,57,58] |

Text and Historical Price Data | [6,29,40,55,56,57,58,81,82] |

Historical Price Data, Text, and Technical Indicator | [40,58,81,82] |

Historical Price Data and Fundamental Data | [1,2,4,28,32,68,83,84] |

Image Data (e.g., Candle Stick chart) | [22,40] |

Article | Prediction Model Used |
---|---|

[28] | Stochastic gradient descent linear regression, Lasso regression, and XGBoost tree regression |

[51] | SVM, LR, Perceptron |

[49] | NB, SMO, IBK, LWL, PART, J48, RF, and DT |

[50] | SVM |

[66] | RF |

Article | Prediction Model Used |
---|---|

[25,27] | ANN |

[24,29] | LSTM, GRU |

[3,18,26,54] | LSTM |

[20] | GRU |

[47,73] | Deep Neural Network |

[64] | CNN |

[69] | RNN |

[75] | MLP, dynamic artificial neural network |

Article | Prediction Model Used |
---|---|

[2,31,44,63,72] | RL |

[4,9,23,32,35,39,41,42,43,46,48,55,56,60,67,70,71] | Deep RL |

[40] | RL-Q-Learning |

[58] | RL-Deep Q-Learning |

[59] | RL-Deep Q-Learning |

[46] | Gated Deep Q-Learning |

[8] | Deep Direct RL |

Article | Prediction Model |
---|---|

[10] | ARMA, LSTM |

[30] | LSTM and Random Forest |

[19] | NLP and ML |

[21] | GRU and RL |

[22] | CNN and Deep RL |

[34] | LSTM, DRL |

[57] | Decision Tree, Bagging, Random Forest, AdaBoost, Gradient Boosting, XGBoost, ANN, RNN, and LSTM |

[37] | LSTM, RNN, KNN, LR, RF, DT, GBT, ABT NB, LDA, QDA, SVC |

[1] | ML, DL, RL, Deep RL |

[33] | LM, RF, SVM, ANN |

[7] | ML, DL, RL, Deep RL |

[36] | CNN, LSTM, Deep RL |

[61] | SVM, RF, ANN |

[65,74,76] | SVM, ANN |

[68] | MLP, DBN, SAE, RNN, LSTM, GRU, CART, NB, RF, LR, SVM |

Name of Method | Advantages | Disadvantages | Article |
---|---|---|---|

ARIMA | - +
- This method is effective for linear time series.
- +
- It gives more reliable and efficient short-term forecasts than similar models with more complicated structural assumptions.
| - −
- Not very effective with nonlinear time series.
- −
- Requires careful setting of parameters and relies on user assumptions that may not hold true, leading to erroneous clustering.
| [85] |

LR | - +
- Excellent capacity for dealing with complicated nonlinear patterns.
| - −
- Powerful presumptions and a heightened sensitivity to outliers
| [85] |

SVM | - +
- With its high level of prediction accuracy and capacity to deliver the best global solution.
- +
- Performs well on many categorization problems, even ones of high dimension.
| - −
- Depending heavily on the choices made for the parameters.
- −
- Sensitive to outliers.
| [85] |

DT | - +
- Comparatively, they are fast and efficient.
- +
- We do not care about the decision tree’s feature scalability.
- +
- It shows us the relevance of attributes.
| - −
- possible over-fitting
- −
- It does not apply to large datasets.
| [37] |

RF | - +
- Due to its random modeling of the feature space and its architecture including many decision trees, this technique is robust for use in forecasting and classification tasks.
- +
- Can be used with either continuous or discrete data.
| - −
- Creates more trees, therefore more processing time and storage space are needed.
- −
- Training takes more time than decision trees.
| [85] |

kNN | - +
- Resistant to the presence of noisy training data.
- +
- Extremely effective when working with huge training datasets.
| - −
- The number of the closest neighbors is the first thing that has to be established.
- −
- Can take up a lot of computing time.
| [85] |

Name of Method | Advantages | Disadvantages | Article |
---|---|---|---|

RL/DRL | - +
- Models that use reinforcement learning can learn from their actions and obtain feedback, or rewards, which help them, do better in the future.
- +
- To handle high-dimensional and complicated issues, reinforcement learning models are utilized in conjunction with artificial neural networks.
- +
- No model or policy is needed to discover the value of acts. Self-directed model
| - −
- Large datasets are required for reinforcement learning to obtain more accurate benchmarks and judgments
- −
- Reinforcement learning approaches are limited by the agent’s exploration of the environment.
- −
- One difficulty with reinforcement learning is coming up with a good reward structure for the model.
| [3]. |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sahu, S.K.; Mokhade, A.; Bokde, N.D.
An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges. *Appl. Sci.* **2023**, *13*, 1956.
https://doi.org/10.3390/app13031956

**AMA Style**

Sahu SK, Mokhade A, Bokde ND.
An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges. *Applied Sciences*. 2023; 13(3):1956.
https://doi.org/10.3390/app13031956

**Chicago/Turabian Style**

Sahu, Santosh Kumar, Anil Mokhade, and Neeraj Dhanraj Bokde.
2023. "An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges" *Applied Sciences* 13, no. 3: 1956.
https://doi.org/10.3390/app13031956