Online Investor Sentiment via Machine Learning
Abstract
1. Introduction
2. Econometric Methodology
2.1. Model Setup
2.2. Machine Learning Methods
2.3. Choice of Tuning Parameters
3. Empirical Results
3.1. Data
3.2. Out-of-Sample Forecasting
3.3. Asset Allocation Implications
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Rapach, D.; Zhou, G. Forecasting stock returns. In Handbook of Economic Forecasting; Elliott, G., Timmermann, A., Eds.; Elsevier: Amsterdam, The Netherlands, 2013; pp. 328–383. [Google Scholar]
- Campbell, J.Y.; Thompson, S.B. Predicting excess stock returns out of sample: Can anything beat the historical average? Rev. Financ. Stud. 2008, 21, 1509–1531. [Google Scholar] [CrossRef]
- Cai, Z.; Chen, P. Online Investor Sentiment and Asset Returns; Working Paper; Department of Economics, University of Kansas: Lawrence, KS, USA, 2022; Available online: https://ideas.repec.org/p/kan/wpaper/202216.html (accessed on 23 November 2022).
- Markowitz, H. The utility of wealth. J. Political Econ. 1952, 60, 151–158. [Google Scholar] [CrossRef]
- Baker, M.; Wurgler, J. Investor sentiment and the cross-section of stock returns. Rev. Financ. Stud. 2006, 61, 1645–1680. [Google Scholar] [CrossRef]
- Baker, M.; Wurgler, J. Investor sentiment in the stock market. J. Econ. Perspect. 2007, 21, 129–152. [Google Scholar] [CrossRef]
- Brown, G.W.; Cliff, M.T. Investor sentiment and the near-term stock market. J. Empir. Financ. 2004, 11, 1–27. [Google Scholar] [CrossRef]
- Huang, D.; Jiang, F.; Tu, J.; Zhou, G. Investor sentiment aligned: A powerful predictor of stock returns. Rev. Financ. Stud. 2015, 28, 791–837. [Google Scholar] [CrossRef]
- Jiang, F.; Lee, J.; Martin, X.; Zhou, G. Manager sentiment and stock returns. J. Financ. Econ. 2019, 132, 126–149. [Google Scholar] [CrossRef]
- Lemmon, M.; Portniaguina, E. Consumer confidence and asset prices: Some empirical evidence. Rev. Financ. Stud. 2006, 19, 1499–1529. [Google Scholar] [CrossRef]
- Gu, S.; Kelly, B.; Xiu, D. Empirical asset pricing via machine learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef]
- Feng, G.; He, J.; Polson, N.G. Deep learning for predicting asset returns. arXiv 2018, arXiv:1804.09314. [Google Scholar]
- Feng, G.; Polson, N.G.; Xu, J. Deep learning in characteristics-sorted factor models. J. Financ. Quant. Anal. 2023, 1–36. [Google Scholar] [CrossRef]
- Yi, Y. Machine Learning and Empirical Asset Pricing. Doctor of Business. Administration Dissertation, Olin Business School, Washington University in St. Louis, St. Louis, MI, USA, 2019. [Google Scholar]
- Ndikum, P. Machine learning algorithms for financial asset price forecasting. arXiv 2020, arXiv:2004.01504. [Google Scholar]
- Chen, L.; Pelger, M.; Zhu, J. Deep learning in asset pricing. Manag. Sci. 2024, 70, 714–750. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Tian, M.; Jiang, F.; Tang, G. Deep learning and factor investing in Chinese stock market. China Econ. Q. 2022, 22, 819–842. [Google Scholar]
- Bartov, E.; Faurel, L.; Mohanram, P.S. Can Twitter help predict firm-level earnings and stock returns? Account. Rev. 2018, 93, 25–57. [Google Scholar] [CrossRef]
- Behrendt, S.; Schmidt, A. The Twitter myth revisited: Intraday investor sentiment, Twitter activity and individual-level stock return volatility. J. Bank. Financ. 2018, 96, 355–367. [Google Scholar] [CrossRef]
- Ranco, G.; Aleksovski, D.; Caldarelli, G.; Grčar, M.; Mozetič, I. The effects of Twitter sentiment on stock price returns. PLoS ONE 2015, 10, e0138441. [Google Scholar] [CrossRef]
- Yang, S.Y.; Mo, S.Y.K.; Liu, A. Twitter financial community sentiment and its predictive relationship to stock market movement. Quant. Financ. 2015, 15, 1637–1656. [Google Scholar] [CrossRef]
- Renault, T. Intraday online investor sentiment and return patterns in the US stock market. J. Bank. Financ. 2017, 84, 25–40. [Google Scholar] [CrossRef]
- Sun, L.; Naj, M.; Shen, J. Stock return predictability and investor sentiment: A high-frequency perspective. J. Bank. Financ. 2016, 73, 147–164. [Google Scholar] [CrossRef]
- Jiang, F.; Meng, L.; Tang, G. Media textual sentiment and Chinese stock return predictability. China Econ. Q. 2021, 12, 1323–1344. [Google Scholar]
- Majumder, S.; Singh, A.; Singh, A.; Karpenko, M.; Sharma, H.K.; Mukhopadhyay, S. On the analytical study of the service quality of Indian Railways under soft-computing paradigm. Transport 2024, 39, 54–63. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Networks 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
- Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 2017, 94, 103–114. [Google Scholar] [CrossRef]
- Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Elman, J.L. Distributed representations, simple recurrent networks, and grammatical structure. Mach. Learn. 1991, 7, 195–225. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Wager, S.; Athey, S. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
- Coulombe, P.G. The macroeconomy as a random forest. J. Appl. Econom. 2024, 39, 401–421. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Cai, Z.; Fan, F.; Yao, Q. Functional-coefficient regression models for nonlinear time series. J. Am. Stat. Assoc. 2000, 95, 941–956. [Google Scholar] [CrossRef]
- Shao, J. Linear model selection by cross-validation. J. Am. Stat. Assoc. 1993, 88, 486–494. [Google Scholar] [CrossRef]
- Cai, Z.; Gunawan; Sun, Y. A New Nonparametric Combination Forecasting with Structural Breaks; Working Paper; Department of Economics, University of Kansas: Lawrence, KS, USA, 2024; Available online: https://journals.ku.edu/econpapers/article/view/22878 (accessed on 23 September 2024).
- Kelly, B.; Xiu, D. Financial machine learning. Found. Trends Financ. 2023, 13, 205–363. [Google Scholar] [CrossRef]
- Da, Z.; Engelberg, J.; Gao, P. The sum of all FEARS investor sentiment and asset prices. Rev. Financ. Stud. 2015, 28, 1–32. [Google Scholar] [CrossRef]
- Tetlock, P.C. Giving content to investor sentiment: The role of media in the stock market. J. Financ. 2007, 62, 1139–1168. [Google Scholar] [CrossRef]
- Tetlock, P.C.; Saar-Tsechansky, M.; Macskassy, S. More than words: Quantifying language to measure firms’ fundamentals. J. Financ. 2008, 63, 1437–1467. [Google Scholar] [CrossRef]
- Dixon, W.J.; Yuen, K.K. Trimming and winsorization: A review. Stat. Pap. 1974, 15, 157–170. [Google Scholar] [CrossRef]
- Chu, B.; Qureshi, S. Comparing out-of-sample performance of machine learning methods to forecast U.S. GDP growth. Comput. Econ. 2024, 62, 1567–1609. [Google Scholar] [CrossRef]
- Loughran, T.; McDonald, B. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 2011, 66, 35–65. [Google Scholar] [CrossRef]
- Cai, Z.; Yuang, J.; Pan, Y. China economic policy uncertainty and its forecasting based on a new textual mining method. China J. Econom. 2023, 3, 1–21. [Google Scholar]
- Loughran, T.; McDonald, B. Textual analysis in accounting and finance: A survey. J. Account. Res. 2016, 54, 1187–1230. [Google Scholar] [CrossRef]
Contribution | IRA Contribution | IRA | 401k | 401k Contribution | Roth Contribution | |
---|---|---|---|---|---|---|
Contribution | ||||||
IRA contribution | ||||||
IRA | ||||||
401k | ||||||
401k contribution | ||||||
Roth contribution |
J = 26 | J = 52 | |||||
---|---|---|---|---|---|---|
Type | RMSE | RMSLE | RMSE | RMSLE | ||
XGBoost | 2.77 | 1.55 | 1.54 | 3.95 | 1.62 | 1.61 |
RF | 0.45 | 1.61 | 1.60 | 1.55 | 1.68 | 1.67 |
FNN | −0.77 | 1.86 | 1.84 | 0.03 | 2.12 | 2.11 |
RNN | 0.18 | 1.63 | 1.63 | 0.75 | 1.72 | 1.74 |
OLS | −19.58 | 1.69 | 1.69 | −30.07 | 1.86 | 1.85 |
LASSO | −0.25 | 1.55 | 1.55 | −0.16 | 1.64 | 1.63 |
Ridge | −0.49 | 1.55 | 1.55 | −0.30 | 1.63 | 1.63 |
Elastic net | −0.03 | 1.56 | 1.55 | −0.03 | 1.63 | 1.62 |
Aggregated Investor Sentiment Index | ||||||
2.08 | 1.53 | 1.53 | 1.85 | 1.62 | 1.61 | |
0.24 | 1.57 | 1.56 | 0.73 | 1.63 | 1.62 |
Index | = 1 | = 3 | = 5 | |||
---|---|---|---|---|---|---|
CER Gain | Sharpe Ratio | CER Gain | Sharpe Ratio | CER Gain | Sharpe Ratio | |
XGBoost | 0.95 | 0.30 | 0.32 | 0.30 | 0.19 | 0.30 |
RF | 0.62 | 0.24 | 0.21 | 0.24 | 0.12 | 0.24 |
FNN | 0.05 | 0.31 | 0.02 | 0.31 | 0.01 | 0.31 |
RNN | 0.17 | 0.33 | 0.06 | 0.33 | 0.03 | 0.33 |
OLS | 1.03 | 0.14 | 0.34 | 0.14 | 0.21 | 0.14 |
LASSO | 0.17 | 0.40 | 0.06 | 0.40 | 0.03 | 0.40 |
Ridge | −0.03 | 0.28 | −0.01 | 0.28 | −0.01 | 0.28 |
Elastic net | −0.01 | 0.29 | −0.002 | 0.29 | −0.001 | 0.29 |
Aggregated Investor Sentiment Index | ||||||
0.56 | 0.30 | 0.19 | 0.30 | 0.11 | 0.30 | |
0.33 | 0.27 | 0.11 | 0.27 | 0.07 | 0.27 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cai, Z.; Chen, P. Online Investor Sentiment via Machine Learning. Mathematics 2024, 12, 3192. https://doi.org/10.3390/math12203192
Cai Z, Chen P. Online Investor Sentiment via Machine Learning. Mathematics. 2024; 12(20):3192. https://doi.org/10.3390/math12203192
Chicago/Turabian StyleCai, Zongwu, and Pixiong Chen. 2024. "Online Investor Sentiment via Machine Learning" Mathematics 12, no. 20: 3192. https://doi.org/10.3390/math12203192
APA StyleCai, Z., & Chen, P. (2024). Online Investor Sentiment via Machine Learning. Mathematics, 12(20), 3192. https://doi.org/10.3390/math12203192