FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models
Abstract
1. Introduction
- Apply domain-specific sentiment analysis to develop LLMs for the prediction of financial instruments using multiple data sources from the financial domain.
- Enhance the model’s performance by leveraging pretraining and fine-tuning using financial corpora during model development.
- Compare the performance of the model against a set of sentiment analyzers, which consists of commercial sentiment analyzers, commercial generative AI models, academic sentiment analysis models, and open-source sentiment analyzers.
2. Related Work
3. Materials and Methods
3.1. Datasets and Data Preparation
3.1.1. Pretraining Dataset
3.1.2. Fine-Tuning Datasets
3.1.3. Testing Datasets
3.1.4. Data Preprocessing
3.2. Model Development
3.2.1. Pretraining
3.2.2. Fine-Tuning
3.2.3. Learning Rate
3.2.4. Epoch
3.2.5. Batch Size
3.2.6. Handling Imbalanced Classification Datasets
3.3. Model Development Results
4. Results
4.1. Base Model Performance
4.2. Ensemble Models’ Performance
4.3. Study Limitations
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Abbreviation | Definition |
NLP | natural language processing |
LLM | large language model |
BERT | bidirectional encoder representations from transformers |
FinBERT | BERT model for financial sentiment analysis |
RNN | recurrent neural network |
CNN | convolutional neural network |
LSTM | long short-term memory |
ELMo | embeddings from language model |
XLNet | eXtreme multi-label text classification |
ULMFit | universal language model fine-tuning |
GPT | generative pretrained transformers |
FSA | financial sentiment analysis |
FSM | Fin-SoMe |
SET5 | SemEval-2017 Task 5 |
EMH | efficient market hypothesis |
SSIX | social sentiment indices powered by X-scores |
TRC2 | Thomson Reuters Text Research Collection |
NTUSD-Fin | National Taiwan University social media dataset financial |
EPS | earnings per share |
P/E | price-to-earning ratio |
FCF | free cash flow |
ROE | return on equity |
RSI | relative strength index |
MACD | moving average convergence |
OBV | on-balance volume |
D/E | debt-to-equity ratio |
P/B | price-to-book ratio |
FTS | full-text search |
VADER | Valence Aware Dictionary and sEntiment Reasoner |
ADASYN | adaptive synthetic sampling |
SMOTE | synthetic minority over-sampling technique |
References
- Financial Terms Dictionary. Investopedia. Available online: https://www.investopedia.com/financial-term-dictionary-4769738 (accessed on 30 November 2021).
- Fama, E.F. Random Walks in Stock Market Prices. Financ. Anal. J. 1965, 21, 55–59. [Google Scholar] [CrossRef]
- Fama, E.F. Efficient Capital Markets: A Review of Theory and Empirical Work. J. Financ. 1970, 25, 383–417. [Google Scholar] [CrossRef]
- Twitter, Inc. Available online: https://twitter.com/ (accessed on 30 November 2021).
- StockTwits, Inc. Available online: https://stocktwits.com/ (accessed on 30 November 2021).
- Wang, G.; Wang, T.; Wang, B.; Sambasivan, D.; Zhang, Z.; Zheng, H.; Zhao, B.Y. Crowds on Wall Street: Extracting value from collaborative investing platforms. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, Vancouver, BC, Canada, 14–18 March 2015; pp. 17–30. [Google Scholar]
- Sohangir, S.; Petty, N.; Wang, D. Financial sentiment lexicon analysis. In Proceedings of the IEEE 12th IEEE International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 12 April 2018; pp. 286–289. [Google Scholar]
- Sohangir, S.; Wang, D.; Pomeranets, A.; Khoshgoftaar, T.M. Big data: Deep learning for financial sentiment analysis. J. Big Data 2018, 5, 3. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1253. [Google Scholar] [CrossRef]
- Zhao, L.; Li, L.; Zheng, X. A BERT Based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts. arXiv 2020, arXiv:2001.05326. [Google Scholar] [CrossRef]
- Cui, X.; Lam, D.; Verma, A. Embedded Value in Bloomberg News and Social Sentiment Data; Bloomberg, Technical Report. 2016. Available online: https://developer.twitter.com/content/dam/developer-twitter/pdfs-and-files/Bloomberg-Twitter-Data-Research-Report.pdf (accessed on 30 November 2021).
- Tetlock, T.C. Giving Content to Investor Sentiment: The Role of Media in the Stock Market. J. Financ. 2007, 62, 1139–1168. [Google Scholar] [CrossRef]
- Tetlock, P.C.; Saar-Tsechansky, M.; Macskassy, S. More Than Words: Quantifying Language to Measure Firms’ Fundamentals. J. Financ. 2008, 63, 1437–1467. [Google Scholar] [CrossRef]
- Delgadillo, J.; Kinyua, J.D.; Mutigwe, C. A BERT-based Model for Financial Social Media Sentiment Analysis. In Proceedings of the International Conference on Applications of Sentiment Analysis (ICASA 2022), Cairo, Egypt, 15–16 December 2022. [Google Scholar]
- Zimbra, D.; Abbasi, A.; Zeng, D.; Chen, H. The State-of-the-Art in Twitter Sentiment Analysis: A Review and Benchmark Evaluation. ACM Trans. Manag. Inf. Syst. 2018, 9, 1–29. [Google Scholar] [CrossRef]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), pp. 4171–4186. [Google Scholar]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1: Long Papers, pp. 328–339. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 1 (Long Papers), pp. 2227–2237. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 8. [Google Scholar]
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3615–3620. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pretrained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
- Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling clinical notes and predicting hospital readmission. arXiv 2019, arXiv:1904.05342. [Google Scholar]
- Agaian, S.; Kolm, P. Financial sentiment analysis using machine learning techniques. Int. J. Invest. Manag. Financ. Innov. 2017, 3, 1–9. [Google Scholar]
- Man, X.; Luo, T.; Lin, J. Financial Sentiment Analysis (FSA): A Survey. In Proceedings of the IEEE International Conference on Industrial Cyber Physical Systems (ICPS), Taipei, Taiwan, 6–9 May 2019; pp. 617–622. [Google Scholar]
- Yang, S.; Rosenfeld, J.; Makutonin, J. Financial aspect-based sentiment analysis using deep representations. arXiv 2018, arXiv:1808.07931. Available online: http://arxiv.org/abs/1808.07931 (accessed on 30 November 2021).
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. Available online: http://arxiv.org/abs/1907.11692 (accessed on 30 November 2021).
- Araci, D. FinBERT: Financial Sentiment Analysis with Pretrained Language Models. arXiv 2019, arXiv:1908.10063. Available online: http://arxiv.org/abs/1908.10063 (accessed on 30 November 2021).
- Araci, D.T.; Zulkuf Genc, Z.; FinBERT: Financial Sentiment Analysis with BERT. Prosus AI Tech Blog. 2020. Available online: https://medium.com/prosus-ai-tech-blog/finbert-financial-sentiment-analysis-with-bert-b277a3607101 (accessed on 1 July 2022).
- Reuters Corpora (RCV1, RCV2, TRC2). National Institute of Standards and Technology. 2004. Available online: https://trec.nist.gov/data/reuters/reuters.html (accessed on 6 April 2023).
- Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P. Good debt or bad debt: Detecting semantic orientations in economic texts. J. Assoc. Inf. Sci. Technol. 2014, 65, 782–796. [Google Scholar] [CrossRef]
- Desola, V.; Hanna, K.; Nonis, P. FinBERT: Pretrained Model on SEC Filings for Financial Natural Language Tasks; Technical Report; University of California: Los Angeles, CA, USA, 2019. [Google Scholar]
- Liu, Z.; Huang, D.; Huang, K.; Li, Z.; Zhao, J. FinBERT: A Pretrained Financial Language Representation Model for Financial Text Mining. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Virtual, 7–15 January 2021; pp. 4513–4519. [Google Scholar]
- Common Crawl. Available online: https://commoncrawl.org/ (accessed on 30 November 2021).
- FinancialWeb. Available online: https://www.finweb.com/ (accessed on 30 November 2021).
- Yahoo! Finance. Available online: https://finance.yahoo.com/ (accessed on 30 November 2021).
- Reddit. Available online: https://www.reddit.com/ (accessed on 30 November 2021).
- Financial Opinion Mining and Question Answering. 2017. Available online: https://sites.google.com/view/fiqa/ (accessed on 30 November 2021).
- The First Workshop on Financial Technology and Natural Language Processing (FinNLP) with a Shared Task for Sentence Boundary Detection in PDF Noisy Text in the Financial Domain (FinSBD). [n. d.]. Available online: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp/ (accessed on 30 November 2021).
- Yang, Y.; UY, M.C.S.; Huang, A. FinBERT: A Pretrained Language Model for Financial Communications. arXiv 2020, arXiv:2006.08097. Available online: https://arxiv.org/abs/2006.08097 (accessed on 30 November 2021).
- Huang, A.H.; Zang, A.Y.; Zheng, R. Evidence on the Information Content of Text in Analyst Reports. Account. Rev. 2014, 89, 2151–2180. [Google Scholar] [CrossRef]
- Wilksch, M.; Abramova, O. PyFin-sentiment: Towards a machine-learning-based model for deriving sentiment from financial tweets. Int. J. Inf. Manag. Data Insights 2023, 3, 100171. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; pp. 216–225. [Google Scholar]
- Chen, C.-C.; Huang, H.-H.; Chen, H.-H. NTUSD-Fin: A market sentiment dictionary for financial social media data applications. In Proceedings of the 1st Financial Narrative Processing Workshop (FNP 2018), Miyazaki, Japan, 7–12 May 2018; pp. 37–43. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Lee, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 5753–5763. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. Available online: http://arxiv.org/abs/1909.11942 (accessed on 30 November 2021).
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. Available online: http://arxiv.org/abs/1910.01108 (accessed on 30 November 2021).
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pretraining for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. Available online: http://arxiv.org/abs/1910.13461 (accessed on 30 November 2021).
- Mishev, K.; Gjorgjevikj, A.; Vodenska, I.; Chitkushev, L.T.; Trajanov, D. Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers. IEEE Access 2020, 8, 131662–131682. [Google Scholar] [CrossRef]
- Bartunov, O.; Sigaev, T. Full-Text Search in PostgreSQL—Gentle Introduction; Technical Report; Moscow University: Moscow, Russia, 2007. [Google Scholar]
- Gaillat, T.; Zarrouk, M.; Freitas, A.; Davis, B. The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018; pp. 2671–2675. [Google Scholar]
- Chen, C.-C.; Huang, H.-H.; Chen, H.-H. Issues and Perspectives from 10,000 Annotated Financial Social Media Data. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 6106–6110. [Google Scholar]
- SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News. Available online: https://alt.qcri.org/semeval2017/task5/ (accessed on 30 November 2021).
- Daudert, T. A Multi-Source Entity-Level Sentiment Corpus for the Financial Domain: The Fin-Lin Corpus. arXiv 2020, arXiv:2003.04073. Available online: http://arxiv.org/abs/2003.04073 (accessed on 30 November 2021). [CrossRef] [PubMed]
- Saif, H.; Fernández, M.; He, Y.; Alani, H. Evaluation datasets for Twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In Proceedings of the 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013), Turin, Italy, 3 December 2013. [Google Scholar]
- Taborda, B.; de Almeida, A.; Dias, J.C.; Batista, F.; Ribeiro, R. Stock Market Tweets Data. IEEE Dataport 2021. [Google Scholar] [CrossRef]
- Balaji, P.; Nagaraju, O.; Haritha, D. Levels of Sentiment Analysis and its Challenges: A Literature Review. In Proceedings of the International Conference of Big Data Analytics and Computational Intelligence (ICBDAC), Chirala, India, 23–25 March 2017; pp. 400–403. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalance learning. In Proceedings of the 2008 IEEE International Conference on Neural Networks (IJCNN 2008), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Li, X.; Wang, X.; Liu, H. Research on fine-tuning strategy of sentiment analysis model based on BERT. In Proceedings of the 2021 IEEE 3rd International Conference on Communications, Information System and Computer Engineering (CISCE), Beijing, China, 14–16 May 2021; pp. 798–802. [Google Scholar]
- Popel, M.; Bojar, O. Training Tips for the Transformer Model. arXiv 2018, arXiv:1804.00247. Available online: https://arxiv.org/pdf/1804.00247.pdf (accessed on 25 June 2022). [CrossRef]
- Amazon Web Services. Amazon Comprehend: Features. Available online: https://aws.amazon.com/comprehend/features (accessed on 25 June 2022).
- Amazon Web Services. Amazon Comprehend Developer Guide. Available online: https://docs.aws.amazon.com/comprehend/latest/dg/comprehend-dg.pdf.how-sentiment (accessed on 25 June 2022).
- OpenAI, GPT-3.5 Turbo. Available online: https://platform.openai.com/docs/models/gpt-3-5-turbo (accessed on 15 March 2024).
- IBM Cloud API Docs: Natural Language Understanding. Available online: https://cloud.ibm.com/apidocs/natural-language-understanding?code=python (accessed on 25 June 2022).
- IBM. Watson Natural Language Understanding: Features. Available online: https://www.ibm.com/cloud/watson-natural-language-understanding/details (accessed on 25 June 2022).
- SentiStrength. Available online: http://sentistrength.wlv.ac.uk/ (accessed on 25 June 2022).
- Hoang, M.; Bihorac, O.A.; Rouces, J. Aspect-Based Sentiment Analysis using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019; pp. 187–196. Available online: https://aclanthology.org/W19-6120/ (accessed on 30 November 2021).
- Goertzel, B. Generative AI vs. AGI: The Cognitive Strengths and Weaknesses of Modern LLMs. 2023. Available online: https://arxiv.org/pdf/2309.10371.pdf (accessed on 25 June 2022).
- Rahutomo, F.; Kitasuka, T.; Aritsugi, M. Semantic Cosine Similarity. In Proceedings of the 7th International Student Conference on Advanced Science and Technology, Seoul, Republic of Korea, 29–30 October 2012; Available online: https://www.researchgate.net/publication/262525676_Semantic_Cosine_Similarity (accessed on 30 November 2021).
- Nora Raju, T.; Rahana, P.A.; Moncy, R.; Ajay, S.; Nambiar, S.K. Sentence Similarity—A State of Art Approaches. In Proceedings of the International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS), Kochi, India, 23–25 June 2022; pp. 1–6. [Google Scholar]
Analysis | Avg. FT | FSM | SET5 | SSIX | Avg. Test | Fin-Lin | Taborda | Sanders |
---|---|---|---|---|---|---|---|---|
Unique documents | 4203 | 9885 | 1133 | 1591 | 2452 | 2794 | 1284 | 3277 |
Avg doc length | 96 | 118 | 80 | 89 | 119 | 107 | 151 | 100 |
Positive (#) | 3005 | 7377 | 655 | 984 | 706 | 1101 | 523 | 494 |
Neutral (#) | 903 | 1805 | 375 | 528 | 1172 | 874 | 420 | 2223 |
Negative (#) | 295 | 703 | 103 | 79 | 573 | 819 | 341 | 560 |
Positive (%) | 65 | 75 | 58 | 62 | 32 | 39 | 41 | 15 |
Neutral (%) | 28 | 18 | 33 | 33 | 44 | 31 | 33 | 68 |
Negative (%) | 7 | 7 | 9 | 4 | 24 | 29 | 27 | 17 |
Token count mean | 21 | 27 | 17 | 20 | 25 | 23 | 32 | 21 |
Token count median | 20 | 27 | 14 | 19 | 24 | 22 | 29 | 21 |
Word count mean | 16 | 20 | 13 | 15 | 17 | 15 | 22 | 15 |
Pretraining Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
NONE | 0.526 | 0.610 | 0.572 | 0.570 |
TRC5K | 0.506 | 0.620 | 0.520 | 0.548 |
TRC100K | 0.536 | 0.634 | 0.547 | 0.572 |
TRC150K | 0.512 | 0.593 | 0.495 | 0.533 |
Pretraining Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
NONE | 0.505 | 0.584 | 0.555 | 0.548 |
TRC5K | 0.457 | 0.593 | 0.475 | 0.509 |
TRC100K | 0.512 | 0.633 | 0.529 | 0.558 |
TRC150K | 0.471 | 0.568 | 0.450 | 0.496 |
Fine-Tuning Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
FSM | 0.536 | 0.634 | 0.547 | 0.572 |
FSM_ADASYN | 0.447 | 0.616 | 0.439 | 0.501 |
FSM_SMOTE | 0.393 | 0.651 | 0.375 | 0.473 |
SET5 | 0.501 | 0.246 | 0.510 | 0.419 |
SSIX | 0.494 | 0.272 | 0.501 | 0.422 |
Fine-Tuning Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
FSM | 0.512 | 0.633 | 0.529 | 0.558 |
FSM_ADASYN | 0.441 | 0.598 | 0.421 | 0.487 |
FSM_SMOTE | 0.354 | 0.583 | 0.316 | 0.417 |
SET5 | 0.424 | 0.160 | 0.426 | 0.336 |
SSIX | 0.428 | 0.213 | 0.423 | 0.355 |
Learning Rate | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
2 | 0.394 | 0.151 | 0.407 | 0.317 |
2 | 0.536 | 0.634 | 0.547 | 0.572 |
2 | 0.459 | 0.658 | 0.495 | 0.537 |
2 | 0.481 | 0.654 | 0.477 | 0.537 |
Learning Rate | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
2 | 0.223 | 0.040 | 0.236 | 0.166 |
2 | 0.512 | 0.633 | 0.529 | 0.558 |
2 | 0.407 | 0.591 | 0.444 | 0.481 |
2 | 0.403 | 0.591 | 0.410 | 0.468 |
Epoch | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
15 | 0.490 | 0.621 | 0.484 | 0.528 |
50 | 0.536 | 0.634 | 0.547 | 0.572 |
75 | 0.515 | 0.647 | 0.521 | 0.561 |
Epoch | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
15 | 0.440 | 0.595 | 0.430 | 0.488 |
50 | 0.512 | 0.633 | 0.529 | 0.558 |
75 | 0.487 | 0.588 | 0.493 | 0.523 |
Batch Size | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
32 | 0.394 | 0.151 | 0.407 | 0.317 |
64 | 0.492 | 0.545 | 0.537 | 0.525 |
96 | 0.542 | 0.575 | 0.551 | 0.556 |
128 | 0.536 | 0.634 | 0.547 | 0.572 |
Batch Size | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
32 | 0.223 | 0.040 | 0.236 | 0.166 |
64 | 0.451 | 0.551 | 0.494 | 0.499 |
96 | 0.517 | 0.586 | 0.524 | 0.542 |
128 | 0.512 | 0.633 | 0.529 | 0.558 |
Fine-Tuning Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
FSM | 0.536 | 0.634 | 0.547 | 0.572 |
FSM_ADASYN | 0.447 | 0.616 | 0.439 | 0.501 |
FSM_SMOTE | 0.393 | 0.651 | 0.375 | 0.473 |
Fine-Tuning Dataset | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
FSM | 0.512 | 0.633 | 0.529 | 0.558 |
FSM_ADASYN | 0.441 | 0.598 | 0.421 | 0.487 |
FSM_SMOTE | 0.354 | 0.583 | 0.316 | 0.417 |
Fin-Lin | Negative | Neutral | Positive |
---|---|---|---|
Accurate: Token Count | 22 | 25 | 22 |
Inaccurate: Token Count | 22 | 23 | 23 |
Severely Inaccurate: Token Count | 23 | 22 | |
Accurate: Noun Count | 7 | 11 | 7 |
Inaccurate: Noun Count | 8 | 9 | 9 |
Severely Inaccurate: Noun Count | 8 | 7 |
Taborda | Negative | Neutral | Positive |
---|---|---|---|
Accurate: Token Count | 32 | 30 | 32 |
Inaccurate: Token Count | 33 | 29 | 35 |
Severely Inaccurate: Token Count | 31 | 34 | |
Accurate: Noun Count | 12 | 12 | 13 |
Inaccurate: Noun Count | 12 | 13 | 12 |
Severely Inaccurate: Noun Count | 12 | 12 |
Sanders | Negative | Neutral | Positive |
---|---|---|---|
Accurate: Token Count | 22 | 20 | 22 |
Inaccurate: Token Count | 22 | 21 | 19 |
Severely Inaccurate: Token Count | 22 | 17 | |
Accurate: Noun Count | 7 | 7 | 7 |
Inaccurate: Noun Count | 7 | 8 | 7 |
Type | Sample Text | Source Dataset |
---|---|---|
Complex socialmedia verbiage | “$INTC $LVS $GM $BX $T $CTL $ABBV Jerome Powell nuked my portfolio today. GG Jerome $F Yeah down we go weeeeeee” | Fin-Lin ID: 2018-09-26T20:47:46Z |
Multiple sentiments | “Although the technical rating is bad, $NAV does present a nice setup opportunity. <URL>” | Fin-Lin ID: 2018-09-18T06:42:00Z |
Complex language | “$TSLA Go Tesla! Make the shorts feel the burn - a well the oil companies, the Koch Brothers etc., $GM $F you need to up your game!” | Fin-Lin ID: 2018-07-02T14:29:46Z |
Processing emotions | “@Tesla stock going up.. I so regret selling my shares at $900 #investing #stocks” | Taborda ID: 879440 |
Processing emotions | “$TSLA LMAO RIP to those that followed ryan brinkman and bought $GM $F” | Fin-Lin ID: 2018-07-25T18:45:53Z |
Multiple perspectives | “Remember the turn? Now start preparing for the greatest market crash in history. I say it like I see it. $TSLA $SPY $GILD $ABBV $PFE $TEVA $TDOC $VIX $VXX $UVXY $SVXY $SPX $GOOG $AMZN $FB https://t.co/OhukyvRnIm” | Taborda ID: 472181 |
Model | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
Amazon-Comprehend | 0.408 | 0.727 | 0.446 | 0.527 |
FinBERT | 0.442 | 0.696 | 0.482 | 0.540 |
FinSoSent | 0.536 | 0.634 | 0.547 | 0.572 |
GPT-3.5-Turbo | 0.524 | 0.651 | 0.474 | 0.550 |
IBM WATSON | 0.464 | 0.634 | 0.515 | 0.538 |
SentiStrength | 0.418 | 0.581 | 0.495 | 0.498 |
VADER | 0.479 | 0.537 | 0.664 | 0.560 |
Model | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
Amazon-Comprehend | 0.349 | 0.733 | 0.382 | 0.488 |
FinBERT | 0.403 | 0.634 | 0.436 | 0.491 |
FinSoSent | 0.512 | 0.633 | 0.529 | 0.558 |
GPT-3.5-Turbo | 0.518 | 0.670 | 0.443 | 0.543 |
IBM WATSON | 0.454 | 0.652 | 0.511 | 0.539 |
SentiStrength | 0.396 | 0.605 | 0.473 | 0.492 |
VADER | 0.477 | 0.567 | 0.661 | 0.568 |
Model | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
Ensemble-MajorityVoting | 0.478 | 0.708 | 0.582 | 0.589 |
Ensemble-SoftVoting | 0.481 | 0.689 | 0.568 | 0.579 |
Model | Fin-Lin | Sanders | Taborda | Average |
---|---|---|---|---|
Ensemble-MajorityVoting | 0.457 | 0.711 | 0.567 | 0.578 |
Ensemble-SoftVoting | 0.468 | 0.691 | 0.563 | 0.574 |
Fin-Lin | Sanders | Taborda | |
---|---|---|---|
Cosine Similarity Score Mean | 0.747 | 0.901 | 0.847 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Delgadillo, J.; Kinyua, J.; Mutigwe, C. FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models. Big Data Cogn. Comput. 2024, 8, 87. https://doi.org/10.3390/bdcc8080087
Delgadillo J, Kinyua J, Mutigwe C. FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models. Big Data and Cognitive Computing. 2024; 8(8):87. https://doi.org/10.3390/bdcc8080087
Chicago/Turabian StyleDelgadillo, Josiel, Johnson Kinyua, and Charles Mutigwe. 2024. "FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models" Big Data and Cognitive Computing 8, no. 8: 87. https://doi.org/10.3390/bdcc8080087
APA StyleDelgadillo, J., Kinyua, J., & Mutigwe, C. (2024). FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models. Big Data and Cognitive Computing, 8(8), 87. https://doi.org/10.3390/bdcc8080087