# Deep Reinforcement Learning Agent for S&P 500 Stock Selection

^{*}

## Abstract

**:**

## 1. Introduction

#### A Short Literature Review of Using Reinforcement Learning Based Methods

## 2. Theoretical Background

## 3. Data

## 4. The Trading Environment and Agent Model Used

^{®}library.

## 5. Training, Validation, and Results

#### 5.1. Feature and Hyper-Parameter Selection

#### Hyper-Parameter Selection

#### 5.2. Results and Performance

#### Statistical Significance of the Results

#### 5.3. Analysis of the Model’s Behavior

## 6. Summary and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A

## References

- Markowitz, H. Portfolio selection. J. Financ.
**1952**, 7, 77–91. [Google Scholar] - Grinblatt, M.; Titman, S. Mutual fund performance: An analysis of quarterly portfolio holdings. J. Bus.
**1989**, 62, 393–416. [Google Scholar] [CrossRef] - Kasten, G.; Swisher, P. Post-Modern Portfolio Theory. J. Financ. Plan.
**2005**, 18, 74. [Google Scholar] - Abarbanell, J.; Bushee, B. Abnormal Returns to a Fundamental Analysis Strategy. Account. Rev.
**1998**, 73, 19–45. [Google Scholar] [CrossRef][Green Version] - Blume, L.; Easley, D.; O’Hara, M. Market Statistics and Technical Analysis: The Role of Volume. J. Financ.
**1994**, 49, 153–181. [Google Scholar] [CrossRef] - Titman, S.; Jegadeesh, N. Returns to buying winners and selling losers: Implications for stock market efficiency. J. Financ.
**1993**, 48, 65–91. [Google Scholar] - Banz, R. The relationship between return and market value of common stocks. J. Financ. Econ.
**1981**, 9, 3–18. [Google Scholar] [CrossRef][Green Version] - Basu, S. The Relationship between Earnings Yield, Market Value and Return for NYSE Common Stocks: Further Evidence. J. Financ. Econ.
**1983**, 12, 129–156. [Google Scholar] [CrossRef] - Fama, E.F.; French, K.R. Dividend yields and expected stock returns. J. Financ. Econ.
**1988**, 22, 3–25. [Google Scholar] [CrossRef] - Schwert, G.W.W. Anomalies and Market Efficiency. SSRN Electron. J.
**2002**. [Google Scholar] [CrossRef][Green Version] - Tsai, C.F.; Wu, J.W. Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl.
**2008**, 34, 2639–2649. [Google Scholar] [CrossRef] - Philip, K.; Chan, S.J.S. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; pp. 164–168. [Google Scholar]
- Lv, D.; Yuan, S.; Li, M.; Xiang, Y. An Empirical Study of Machine Learning Algorithms for Stock Daily Trading Strategy, Mathematical problems in engineering. Math. Probl. Eng.
**2019**, 2019, 1–30. [Google Scholar] - Ta, V.-D.; Liu, C.-M.; Addis, D. Prediction and portfolio optimization in quantitative trading using machine learning techniques. In Proceedings of the 9th International Symposium of Information and Communication Technology (SolICT), Da Nang, Vietnam, 6–7 December 2018; pp. 98–105. [Google Scholar]
- Zhang, K.; Zhong, G.; Dong, J.; Wang, S.; Wang, Y. Stock Market Prediction Based on Generative Adversarial Network. Procedia Comput. Sci.
**2019**, 147, 400–406. [Google Scholar] [CrossRef] - Krauss, C.; Do, X.A.; Huck, N. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. Eur. J. Oper. Res.
**2017**, 259, 689–702. [Google Scholar] [CrossRef][Green Version] - Moody, J.; Saffel, M. Learning to Trade via Direct Reinforcement. IEEE Trans. Neural Netw.
**2001**, 12, 875–889. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lee, J.W.; Park, J.; Jangmin, O.; Lee, J.; Hong, E. A multiagent approach to Q-learning for daily stock trading. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.
**2007**, 37, 864–877. [Google Scholar] [CrossRef][Green Version] - Jiang, Z.; Xu, D.; Liang, J. A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. arXiv
**2017**, arXiv:1706.10059. [Google Scholar] - Liang, Z.; Hao, C.; Junhao, Z.; Kangkang, J.; Yanran, L. Adversial Deep Reinforcement Learning in Portfolio Management. arXiv
**2018**, arXiv:1808.09940. [Google Scholar] - Kim, S.; Kim, S. Index tracking through deep latent representation learning. Quant. Financ.
**2020**, 20, 639–652. [Google Scholar] [CrossRef] - Lee, J.; Kang, J. Effectively training neural networks for stock index prediction: Predicting the S&P 500 index without using its index data. PLoS ONE
**2020**, 15, e0230635. [Google Scholar] [CrossRef][Green Version] - Aggarwal, C.C. Neural Networks and Deep Learning; Springer International Publishing: New York, NY, USA, 2018. [Google Scholar]
- Moody, J.; Wu, L.; Liao, Y.; Saffell, M. Performance functions and reinforcement learning for trading systems and portfolios. J. Forecast.
**1998**, 17, 441–470. [Google Scholar] [CrossRef] - Lu, D.W. Agent Inspired Trading Using Recurrent Reinforcement Learning and LSTM Neural Networks. arXiv
**2017**, arXiv:1509.02971. [Google Scholar] - Almahdi, S.; Yang, S.Y. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Syst. Appl.
**2017**, 87, 267–279. [Google Scholar] [CrossRef] - Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 2327–2333. [Google Scholar]
- Aboussalah, A.; Lee, C.-G. Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Syst. Appl.
**2020**, 140, 112891. [Google Scholar] [CrossRef] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: New York, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv
**2014**, arXiv:1404.2188. [Google Scholar] - Graves, A.; Mohamed, A.-R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Institute of Electrical and Electronics Engineers (IEEE), Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Giles, C.L.; Lawrence, S.; Tsoi, A.C. Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference. Mach. Learn.
**2001**, 44, 161–183. [Google Scholar] [CrossRef][Green Version] - Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv
**2013**, arXiv:1312.5602. [Google Scholar] - Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv
**2015**, arXiv:1509.02971. [Google Scholar] - Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Driessche, G.V.D.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nat. Cell Biol.
**2016**, 529, 484–489. [Google Scholar] [CrossRef] - Bellman, R. The Theory of Dynamic Programming. Bull. Am. Math. Soc.
**1954**, 60, 503–515. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Agent model structure. The filter amounts and shapes of the final model are presented below the layer-boxes as: “# of filters used, (filter shape)”.

**Figure 3.**Frequency of the Sharpe ratios and total returns of 5000 random portfolios compared to the model’s performance.

**Table 1.**Feature selection results with the validation data set. Sharpe ratio and total return for the three models, with four feature-combinations. Legend: TR = Total Return; EP = Earnings Per Share; DY = Divided Yield; Vol = Volume

Model | Layer 1 | Layer 2 | Layer 3 | TR | TR, EP | TR, DY | TR, Vol | |
---|---|---|---|---|---|---|---|---|

Sharpe | A | 2, (2,1) | 20, (14,1) | 1, (1,1) | 1.99 | 2.03 | 2.06 | 1.95 |

B | 3, (2,1) | 50, (14,1) | 1, (1,1) | 2.18 | 2.18 | 2.27 | 1.94 | |

C | 5, (2,1) | 100, (14,1) | 1, (1,1) | 2.20 | 2.25 | 2.22 | 2.07 | |

Total Return | A | 2, (2,1) | 20, (14,1) | 1, (1,1) | 86.38% | 88.76% | 72.08% | 66.73% |

B | 3, (2,1) | 50, (14,1) | 1, (1,1) | 82.42% | 82.11% | 80.14% | 68.87% | |

C | 5, (2,1) | 100, (14,1) | 1, (1,1) | 87.23% | 114.14% | 88.02% | 74.46% |

**Table 2.**Performance of the ten initial and five-second-round hyper-parameter selection models. Model parameters given as number of filters, (filter shape). Model 14 (highlighted) had the best performance and was selected to be the final model version.

Layer 1 | Layer 2 | Layer 3 | Layer 4 | Sharpe | Total Return | |
---|---|---|---|---|---|---|

Model 1 | 3, (2,1) | 50, (14,1) | 1, (1,1) | - | 2.15 | 80.85% |

Model 2 | 4, (2,1) | 100, (14,1) | 1, (1,1) | - | 2.34 | 99.07% |

Model 3 | 3, (3,1) | 50, (13,1) | 1, (1,1) | - | 2.25 | 105.44% |

Model 4 | 5, (2,1) | 100, (14,1) | 1, (1,1) | - | 2.21 | 79.83% |

Model 5 | 5, (3,1) | 70, (13,1) | 1, (1,1) | - | 2.11 | 115.84% |

Model 6 | 10, (2,1) | 100, (14,1) | 1, (1,1) | - | 2.40 | 104.73% |

Model 7 | 20, (2,1) | 200, (14,1) | 1, (1,1) | - | 2.26 | 88.84% |

Model 8 | 3, (1,1) | 3, (2,1) | 50, (14,1) | 1, (1,1) | 2.15 | 84.02% |

Model 9 | 3, (1,1) | 4, (2,1) | 100, (14,1) | 1, (1,1) | 2.14 | 69.54% |

Model 10 | 3, (2,1) | 5, (2,1) | 100, (13,1) | 1, (1,1) | 2.19 | 88.62% |

Model 11 | 8, (2,1) | 80, (14,1) | 1, (1,1) | - | 2.24 | 101.04% |

Model 12 | 8, (3,1) | 80, (13,1) | 1, (1,1) | - | 2.25 | 83.89% |

Model 13 | 5, (3,1) | 100, (13,1) | 1, (1,1) | - | 2.32 | 99.23% |

Model 14 | 5, (3,1) | 50, (13,1) | 1, (1,1) | - | 2.43 | 118.06% |

Model 15 | 10, (3,1) | 100, (13,1) | 1, (1,1) | - | 2.26 | 79.95% |

Total Return | Sharpe | Daily SD | |
---|---|---|---|

Final model | 328.9% | 0.91 | 2.9% |

S&P 500 Index | 54.9% | 0.73 | 0.8% |

Optimal portfolio | 83.3% | 0.78 | 1.1% |

Best stock (Monster) | 114.2% | 0.64 | 2.0% |

Daily Returns | Monthly Returns | |||||||
---|---|---|---|---|---|---|---|---|

Coefficient | Standard Error | t-Test | p-Value | Coefficient | Standard Error | t-Test | p-Value | |

Alpha | 0.00117 | 0.00076 | 1.52936 | 0.12643 | 0.01747 | 0.02823 | 0.61898 | 0.53827 |

Beta | 1.28613 | 0.09129 | 14.08837 | 0.00000 | 3.03349 | 0.80152 | 3.78466 | 0.00036 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Huotari, T.; Savolainen, J.; Collan, M. Deep Reinforcement Learning Agent for S&P 500 Stock Selection. *Axioms* **2020**, *9*, 130.
https://doi.org/10.3390/axioms9040130

**AMA Style**

Huotari T, Savolainen J, Collan M. Deep Reinforcement Learning Agent for S&P 500 Stock Selection. *Axioms*. 2020; 9(4):130.
https://doi.org/10.3390/axioms9040130

**Chicago/Turabian Style**

Huotari, Tommi, Jyrki Savolainen, and Mikael Collan. 2020. "Deep Reinforcement Learning Agent for S&P 500 Stock Selection" *Axioms* 9, no. 4: 130.
https://doi.org/10.3390/axioms9040130