Lightweight Scheme to Capture Stock Market Sentiment on Social Media Using Sparse Attention Mechanism: A Case Study on Twitter
Abstract
:1. Introduction
- Challenge 1: Mismatch between conventional and stock sentiment. The first challenges results from the fact that conventional sentiment analysis significantly differs from stock sentiment analysis. In a detailed analysis, it becomes evident that stock sentiment, though bearing certain correlations, markedly diverges from the traditional sentiment often assessed in academic contexts such as consumer feedback studies, literature reviews, and broader public sentiment analyses. Traditional sentiments are primarily anchored in the emotional spectrum, capturing the nuances between positive and negative affective states Liu (2012). On the contrary, stock sentiment is intrinsically tied to market dynamics, reflecting anticipations of stock price movements and whether they indicate bullish or bearish trends. While there are scenarios where stock sentiment aligns with traditional sentiment, there are also instances where the two sentiments manifest stark disparities. For instance, a public discourse may show skepticism toward a particular economic event, yet there could be an underlying optimism about the potential appreciation in stock value for a company like $TSLA, highlighting a bullish stock sentiment. An extensive compilation of such instances is presented in Table 1.
- Challenge 2: High computational complexity of deep learning models. In recent years, deep learning models, particularly transformers, have achieved state-of-the-art performance across a myriad of tasks in natural language processing, computer vision, and beyond. However, a significant impediment to their broader application and scalability remains the high computational complexity associated with their architecture Lin et al. (2022). Such complexity not only demands substantial computational resources but also poses challenges for real-time processing and deployment in resource-constrained environments. Figure 1 shows that computing the softmax attention constantly dominates (52–58%) the MHA runtime in transformer architecture, particularly as devices grow less powerful and resource constrained. Recognizing these challenges, this paper proposes the adoption of sparse transformers, a variant optimized to reduce computational overhead without compromising the model’s efficacy. By leveraging the sparsity inherent in the transformer’s attention mechanism, we aim to achieve a balance between computational efficiency and model performance, paving the way for more sustainable and scalable deep learning applications.
2. Related Works
2.1. Sentiment Analysis and Related Financial Applications
2.2. Existing Deep Learning Models for Sentiment Analysis
2.2.1. Seq2Seq Model
2.2.2. LSTM Model
2.2.3. Transformer Model
2.2.4. BERT
3. Proposed Methods
3.1. Overview of Sentiment Analysis Pipeline
3.2. Transformer Architecture
- Positional Encoding. The order of the tokens is significant in some tasks, but the transformer model, which employs a self-attention mechanism, is not naturally able to capture this order. As a result, the model uses positional encoding (1) to supplement the input embeddings with additional information that encodes the positions of each token in the input sequence.
- Self-attention mechanism. The input token consists of queries (Q), keys (K) and values (V) of dimension . It is created by averaging the input across the three learnable matrices and .
- Multi-head attention mechanism. The input embeddings are divided into various “heads” for the multi-head attention mechanism, and self-attention is applied to each head separately. The model can capture various kinds of dependencies in input tokens because each head learns to weight the input embeddings based on their relevance to the other input embeddings in the head. The output of multi-head attention looks like this (4), and it illustrates the detailed information between scaled dot-product attention and multi-head attention, as shown in Figure 4:
3.3. Pre-Trained Model BERT
3.4. Sparse Attention Mechanism
4. Experiments
4.1. Experimental Setup
Dataset Introduction and Acquisition
4.2. Model Configuration
4.3. Evaluation Metrics
4.4. Results and Analysis
4.4.1. Sentiment Accuracy
4.4.2. Case Study
4.4.3. Computational Complexity and Efficiency
5. Conclusions
5.1. Summary and Contribution of This Work
5.2. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Abraham, Jethin, Daniel Higdon, John Nelson, and Juan Ibarra. 2018. Cryptocurrency price prediction using tweet volumes and sentiment analysis. SMU Data Science Review 1: 1. [Google Scholar]
- Almatrafi, Omaima, Suhem Parack, and Bravim Chavan. 2015. Application of location-based sentiment analysis using twitter for identifying trends towards indian general elections 2014. Paper presented at the 9th International Conference on Ubiquitous Information Management and Communication, Bali, Indonesia, January 8–10; pp. 1–5. [Google Scholar]
- Arora, Upasana, Shikhar Verma, Ishu Gupta, and Ashutosh Kumar Singh. 2017. Implementing privacy using modified tree and map technique. Paper presented at the 2017 3rd International Conference on Advances in Computing, Communication & Automation (ICACCA)(Fall), Dehradun, India, September 15–16; pp. 1–5. [Google Scholar]
- Aziz, Rabia Musheer, Mohammed Farhan Baluch, Sarthak Patel, and Abdul Hamid Ganie. 2022. Lgbm: A machine learning approach for ethereum fraud detection. International Journal of Information Technology 14: 3321–31. [Google Scholar] [CrossRef]
- Child, Rewon, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv arXiv:1904.10509. [Google Scholar]
- De Mattei, Lorenzo, Andrea Cimino, and Felice Dell’Orletta. 2018. Multi-task learning in deep neural network for sentiment polarity and irony classification. Paper presented at the NL4AI@ AI* IA, Trento, Italy, November 20–23; pp. 76–82. [Google Scholar]
- Deriu, Jan Milan, and Mark Cieliebak. 2016. Sentiment analysis using convolutional neural networks with multi-task training and distant supervision on italian tweets. Paper presented at the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Napoli, Italy, December 5–7. [Google Scholar]
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv arXiv:1810.04805. [Google Scholar]
- Dey, Rahul, and Fathi M Salem. 2017. Gate-variants of gated recurrent unit (gru) neural networks. Paper presented at the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, August 6–9; pp. 1597–600. [Google Scholar]
- Dong, Linhao, Shuang Xu, and Bo Xu. 2018. Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. Paper presented at the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, April 15–20; pp. 5884–5888. [Google Scholar]
- Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv arXiv:2010.11929. [Google Scholar]
- Gandhmal, Dattatray, and Kannan Kumar. 2019. Systematic analysis and review of stock market prediction techniques. Computer Science Review 34: 100190. [Google Scholar] [CrossRef]
- Gupta, Ishu, and Ashutosh Kumar Singh. 2017. A probability based model for data leakage detection using bigraph. Paper presented at the 2017 the 7th International Conference on Communication and Network Security, Tokyo, Japan, November 24–26; pp. 1–5. [Google Scholar]
- Gupta, Ishu, and Ashutosh Kumar Singh. 2020. Seli: Statistical evaluation based leaker identification stochastic scheme for secure data sharing. IET Communications 14: 3607–18. [Google Scholar] [CrossRef]
- Gupta, Ishu, Tarun Kumar Madan, Sukhman Singh, and Ashutosh Kumar Singh. 2022. Hisa-smfm: Historical and sentiment analysis based stock market forecasting model. arXiv arXiv:2203.08143. [Google Scholar]
- Hasselgren, Ben, Christos Chrysoulas, Nikolaos Pitropakis, and William J Buchanan. 2022. Using social media & sentiment analysis to make investment decisions. Future Internet 15: 5. [Google Scholar]
- Hendrycks, Dan, and Kevin Gimpel. 2016. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv arXiv:1606.08415. [Google Scholar]
- Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9: 1735–80. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Weiwei. 2021. Applications of deep learning in stock market prediction: Recent progress. Expert Systems with Applications 184: 115537. [Google Scholar] [CrossRef]
- Khan, Salman, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. ACM Computing Surveys (CSUR) 54: 1–41. [Google Scholar] [CrossRef]
- Lin, Tianyang, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. 2022. A survey of transformers. arXiv arXiv:2106.04554. [Google Scholar] [CrossRef]
- Lin, Zhouhan, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv arXiv:1703.03130. [Google Scholar]
- Liu, Bing. 2012. Sentiment Analysis and Opinion Mining. Cham: Springer Nature Switzerland AG. [Google Scholar]
- Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. Paper presented at the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, October 11–17; pp. 10012–22. [Google Scholar]
- Magnini, Bernardo, Alberto Lavelli, and Simone Magnolini. 2020. Comparing machine learning and deep learning approaches on nlp tasks for the italian language. Paper presented at the 12th Language Resources and Evaluation Conference, Marseille, France, May 11–16; pp. 2110–19. [Google Scholar]
- Man, Xiliu, Tong Luo, and Jianwu Lin. 2019. Financial sentiment analysis (fsa): A survey. Paper presented at the 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), Taipei, Taiwan, May 6–9; pp. 617–22. [Google Scholar]
- Medsker, Larry, and Lakhmi Jain. 2001. Recurrent neural networks. Design and Applications 5: 64–67. [Google Scholar]
- Mishev, Kostadin, Ana Gjorgjevikj, Irena Vodenska, Lubomir T. Chitkushev, and Dimitar Trajanov. 2020. Evaluation of sentiment analysis in finance: From lexicons to transformers. IEEE Access 8: 131662–82. [Google Scholar] [CrossRef]
- Nabipour, Mojtaba, Pooyan Nayyeri, Hamed Jabani, Amir Mosavi, and Ely Salwana. 2020. Deep learning for stock market prediction. Entropy 22: 840. [Google Scholar] [CrossRef]
- Neuenschwander, Bruna, Adriano C. M. Pereira, Wagner Meira Jr., and Denilson Barbosa. 2014. Sentiment analysis for streams of web data: A case study of brazilian financial markets. Paper presented at the 20th Brazilian Symposium on Multimedia and the Web, João Pessoa, Brazil, November 18–21; pp. 167–70. [Google Scholar]
- Pang, Xiongwen, Yanqiang Zhou, Pan Wang, Weiwei Lin, and Victor Chang. 2020. An innovative neural network approach for stock market prediction. The Journal of Supercomputing 76: 2098–18. [Google Scholar] [CrossRef]
- Pathak, Ajeet Ram, Manjusha Pandey, and Siddharth Rautaray. 2021. Topic-level sentiment analysis of social media data using deep learning. Applied Soft Computing 108: 107440. [Google Scholar] [CrossRef]
- Pei, Yulong, Amarachi Mbakwe, Akshat Gupta, Salwa Alamir, Hanxuan Lin, Xiaomo Liu, and Sameena Shah. 2022. Tweetfinsent: A dataset of stock sentiments on twitter. Paper presented at the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP), Vienna, Austria, July 23–25; pp. 37–47. [Google Scholar]
- Pota, Marco, Mirko Ventura, Rosario Catelli, and Massimo Esposito. 2020. An effective bert-based pipeline for twitter sentiment analysis: A case study in italian. Sensors 21: 133. [Google Scholar] [CrossRef]
- Qin, Yao, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv arXiv:1704.02971. [Google Scholar]
- Ruan, Yefeng, Arjan Durresi, and Lina Alfantoukh. 2018. Using twitter trust network for stock market analysis. Knowledge-Based Systems 145: 207–18. [Google Scholar] [CrossRef]
- Sanboon, Thaloengpattarakoon, Kamol Keatruangkamala, and Saichon Jaiyen. 2019. A deep learning model for predicting buy and sell recommendations in stock exchange of thailand using long short-term memory. Paper presented at the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, February 23–25; pp. 757–60. [Google Scholar]
- Saxena, Deepika, Ishu Gupta, Jitendra Kumar, Ashutosh Kumar Singh, and Xiaoqing Wen. 2021. A secure and multiobjective virtual machine placement framework for cloud data center. IEEE Systems Journal 16: 3163–74. [Google Scholar] [CrossRef]
- Singh, Ashutosh Kumar, and Ishu Gupta. 2020. Online information leaker identification scheme for secure data sharing. Multimedia Tools and Applications 79: 31165–82. [Google Scholar] [CrossRef]
- Sohangir, Sahar, Nicholas Petty, and Dingding Wang. 2018. Financial sentiment lexicon analysis. Paper presented at the 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, January 31–February 2; pp. 286–89. [Google Scholar]
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Paper presented at the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, December 4–9. [Google Scholar]
- Wang, Jinjiang, Jianxing Yan, Chen Li, Robert X Gao, and Rui Zhao. 2019. Deep heterogeneous gru model for predictive analytics in smart manufacturing: Application to tool wear prediction. Computers in Industry 111: 1–14. [Google Scholar] [CrossRef]
- Wang, Jin, Liang-Chih Yu, K. Robert Lai, and Xuejie Zhang. 2016. Dimensional sentiment analysis using a regional cnn-lstm model. Paper presented at the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, August 7–12; pp. 225–30. [Google Scholar]
- Yang, Linyi, Tin Lok James Ng, Barry Smyth, and Riuhai Dong. 2020. Html: Hierarchical transformer-based multi-task learning for volatility prediction. Paper presented at the Web Conference 2020, Taipei, Taiwan, April 20–24; pp. 441–51. [Google Scholar]
- Zhao, Bo, Yongji He, Chunfeng Yuan, and Yihua Huang. 2016. Stock market prediction exploiting microblog sentiment analysis. Paper presented at the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, July 24–29; pp. 4482–88. [Google Scholar]
- Zhao, Rui, Ruqiang Yan, Jinjiang Wang, and Kezhi Mao. 2017. Learning to monitor machine health with convolutional bi-directional lstm networks. Sensors 17: 273. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Yuqing, and Wei Xue. 2018. Review of tool condition monitoring methods in milling processes. The International Journal of Advanced Manufacturing Technology 96: 2509–23. [Google Scholar] [CrossRef]
Social Media Content | Conventional Sentiment | Stock Sentiment |
---|---|---|
$TSLA long. | Negative | Positive |
Be Prepared For A DOGE Crash Elon on SNL Dogecoin New Price Predictions. | Negative | Neutral |
Buy the f*cking dip! Hold the line! $AMC $GME $NOK | Negative | Positive |
Dataset Property | Value |
---|---|
Language | English |
Training samples | 1113 |
Testing samples | 1000 |
Total samples | 2113 |
Positive samples | 816 |
Neutral samples | 1030 |
Negative samples | 267 |
Ticker | AMC, BABA, BB, BBBY, CLOV, GME, NOK, PFE, PLTR, SHOP, SOFI, SPCE, SQ, TLRY, TSLA, VIAC, ZM |
Hyperparameter | Value |
---|---|
Attention heads | 12 |
Batch size | 8 |
Epochs | 5 |
Gradient accumulation steps | 16 |
Hidden size | 768 |
Hidden layers | 6, 12, 18 |
Learning rate | 0.00003 |
Maximum sequence length | 128 |
System | F1 Pos | F1 Neg | F1 |
---|---|---|---|
CNN Deriu and Cieliebak (2016) | 0.634 | 0.706 | 0.670 |
LSTM De Mattei et al. (2018) | 0.669 | 0.729 | 0.699 |
Multilingual BERT Magnini et al. (2020) | 0.723 | 0.744 | 0.733 |
Proposed Design | 0.740 | 0.765 | 0.752 |
Tweet Content | Proposed Prediction | Others’ Prediction |
---|---|---|
Here is my entry $Baba $123.63, lol. | Neutral | Positive |
update for today: — what a day! what a week! — 25% down on #btc — traded $bp, $sq, $tecs — added $aapl, $amzn, $baba | Positive | Negative |
Model Name | # Params. | Complexity (FLOPs) | F1 Score |
---|---|---|---|
BERT-Tiny | 50 M | 15 G | 0.6731 |
BERT-Base | 110 M | 55 G | 0.7098 |
BERT-Large | 340 M | 120 G | 0.7525 |
Method | # Params. | Avg. Latency (ms) | Avg. Complexity (GFLOPS) |
---|---|---|---|
LSTM | - | 2.4 | 0.3 |
CNN | 28 M | 8.2 | 3.6 |
BERT-Large | 340 M | 15.8 | 4.8 |
This Work | 197 M | 10.3 | 3.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, S.; Gu, F. Lightweight Scheme to Capture Stock Market Sentiment on Social Media Using Sparse Attention Mechanism: A Case Study on Twitter. J. Risk Financial Manag. 2023, 16, 440. https://doi.org/10.3390/jrfm16100440
Wu S, Gu F. Lightweight Scheme to Capture Stock Market Sentiment on Social Media Using Sparse Attention Mechanism: A Case Study on Twitter. Journal of Risk and Financial Management. 2023; 16(10):440. https://doi.org/10.3390/jrfm16100440
Chicago/Turabian StyleWu, Sihan, and Fuyu Gu. 2023. "Lightweight Scheme to Capture Stock Market Sentiment on Social Media Using Sparse Attention Mechanism: A Case Study on Twitter" Journal of Risk and Financial Management 16, no. 10: 440. https://doi.org/10.3390/jrfm16100440
APA StyleWu, S., & Gu, F. (2023). Lightweight Scheme to Capture Stock Market Sentiment on Social Media Using Sparse Attention Mechanism: A Case Study on Twitter. Journal of Risk and Financial Management, 16(10), 440. https://doi.org/10.3390/jrfm16100440