Social Media Sentiment Analysis
Definition
:1. Introduction
2. Genesis
3. Applications
4. Types
- (1)
- Multi-modal analysis. Many social media platforms support multi-modal communication, incorporating text, images, videos and sometimes audio. Each media mode offers distinct sentiment cues, and multi-modal analysis integrates these diverse data types to derive a richer, more accurate sentiment assessment. For example, textual analysis might interpret a social media post as neutral, but the inclusion of a humorous sound or a sarcastic meme could shift that interpretation to negative or positive, respectively. However, most social media sentiment analyses still focus on written texts.
- (2)
- Temporal dynamics. The dynamic nature of social media means that sentiments can fluctuate rapidly in response to world events, trends or the viral spread of particular content. Temporal analysis tracks these changes over time, enabling researchers to observe the stability and evolution of sentiments within a community and to identify or interpret causal relationships between synchronously unfolding events.
- (3)
- Interaction analysis. Texts that respond to each other in the flow of conversation on social media enable interpretation of causal relationships between sentiments expressed in the posts, comments and replies. Interaction analysis helps to clarify the amplification of sentiments.
- (4)
- Network relationships. On social media platforms that support networked communication, analyzing the interaction of sentiments around an issue helps clarify how sentiments cluster within networks and identify influencers.
- (5)
- Sentiment propagation. Combining the analysis of temporal dynamics and network relationships enables mapping the propagation of sentiments across multiple social networks over time, which is crucial for studying phenomena like viral content or the spread of misinformation.
5. Features
- (1)
- Volume and speed of data generation. Deals with vast and continuously growing streams of data generated by millions of users on social media platforms. The volume of data generation requires high-capacity computational tools.
- (2)
- Multiplicity of texts. Involves texts of varied medium, language, style, structure and length. A post can be written in more than one language or dialect. The style can range from highly formal to highly informal, involving, for example, abbreviations, emojis, slang and profane language. Rhetorical devices such as sarcasm and irony communicate a sentiment opposite to the literal meaning of the words. The text can be as short as one emoji or one word or it can be a lengthy, well-structured argumentation.
- (3)
- Fragmented contexts. Social media posts often reference external events, previous conversations or shared cultural knowledge that may not be explicitly stated within the text, making it difficult for sentiment analysis systems to accurately interpret the full meaning and emotional tone. Furthermore, different platforms have different features (threads, reactions, hashtags, retweets, etc.) that complexify the attribution and identification of sentiment.
- (4)
- Data noise. Some social media platforms allow certain types of accounts to include automated promotion materials in their posts, creating noise that needs to be identified and discarded from relevant information. Fake content generated by bots is another source of data noise that can skew the analysis.
6. Approaches
- (1)
- Deep learning-based approaches. By the 2010s, state-of-the-art results in natural language processing were increasingly achieved by deep learning models, and sentiment analysis was no exception [36]. The use of convolutional (CNN) and recurrent (RNN) neural networks for text analysis and the mathematical representation of words (embeddings) enables the capture of the semantic and syntactic features of words. This improved capability allows for better handling of synonymy, spelling variations, tone shifts and implicit sentiment [37]. Early sentiment models considered words in isolation or in groups of 2 or 3 and lacked context awareness. For instance, they would handle words with slightly different spellings such as “favorite” and “favourite” as entirely distinct, while embeddings would encode them as being almost equivalent. Furthermore, earlier models that relied solely on words irrespective of their relative position in the sentence (bag-of-words approach) would struggle to handle negative turns of phrases such as “I do not think that it was so bad at all”. By contrast, deep learning models are able to consider the entire sequence to understand that the sentiment may be more nuanced than what the sole presence of the word “bad” could initially suggest. Word embeddings also facilitated the handling of multiple languages. Their popularization in the mid-2010s was followed by a sharp increase in multilingual sentiment analysis studies of social media [38]. These approaches also paved the way for the large language models (LLM)-based approaches described further below.
- (2)
- Aspect based sentiment analysis (ABSA). Sentiments expressed in a social media post can be directed at specific and distinct aspects of an idea, event, product or service. Early sentiment analysis techniques indiscriminately aggregated those opinions in a single sentiment score, applied at various levels of granularity (sentence, paragraph, document). In contrast, ABSA tries to identify the different aspect terms mentioned in a text before gauging the sentiment polarity of the judgments expressed with regard to each aspect [39]. Identifying aspect terms and matching them to the expression of sentiment are both complex tasks with their own challenges, independent of the prediction of sentiment polarity. ABSA was originally applied primarily to customer reviews to identify what specific features of a product customers liked or disliked. For instance, in a product review for a smartphone, a user might write: “The camera quality is excellent, but the battery life is disappointing”. ABSA would identify two aspects (camera quality and battery life) and assign positive sentiment to the former and negative sentiment to the latter. ABSA has also been successfully applied to the study of political topics and social issues. For instance, [40] used ABSA to analyze public sentiment about COVID-19 based on 170 different aspects, ranging from “side effects” to “vaccine campaign”.
- (3)
- Multimodal sentiment analysis. Sentiment analysis for social media has long been dominated by textual approaches, as they are simpler to implement and less computationally expensive. Yet social media data often comes in a mix of text, image, sound and video, and increasingly favors the visual over the textual. Traditionally, multimodal sentiment analysis combined several media-specific models and aggregated their results into a final sentiment score [41]. In analyzing a video post, separate models would process the spoken words (through speech-to-text conversion), facial expressions and gestures (with a visual model) and any accompanying text, such as comments and captions (with a textual model). Each model would provide its own sentiment score, which would then be combined using various fusion techniques. Early approaches often used simple averaging or weighted sum methods, while more sophisticated techniques employed machine learning algorithms to learn optimal fusion strategies [42]. The rise of multimodal sentiment analysis has been particularly important for platforms like Instagram, TikTok and YouTube, where visual content plays a crucial role. For example, ref. [43] developed a multimodal approach to analyzing sentiment in YouTube videos combining visual and textual features. Their approach significantly improved sentiment classification accuracy compared to methods that only used either textual or visual models. However, as the latest generation of LLMs shows, AI models are increasingly becoming multimodal by default, making this form of sentiment analysis more accessible [44]. These multimodal LLMs are trained on vast amounts of paired text-image data, allowing them to develop a unified understanding of concepts across modalities. This integrated approach potentially offers more nuanced and context-aware sentiment analysis, as the model can inherently consider the interplay between different modes of expression.
- (4)
- Knowledge graphs and domain specific analysis. Sentiment interpretation is highly context-dependent. At the same time, social media discussions often involve domain-specific terminology, cultural references or current events that require contextual knowledge. Knowledge graphs provide a structured representation of entities, concepts and their relationships, allowing for the incorporation of domain-specific knowledge into the sentiment analysis process. This approach can help resolve ambiguities and identify implicit sentiments in complex or specialized contexts. In specialized fields like healthcare, knowledge graphs can provide context for technical terms that might be neutral in general language but carry sentiment in a specific domain. In particular, the integration of knowledge graphs with language models such as BERT, or, more recently, GPT, has proven very effective [45,46].
7. Large Language Models and Future Directions
- (1)
- LLMs are stochastic. This means that the prompt will give close but not similar outputs. This variability poses significant challenges for scientific reproducibility and consistency in sentiment analysis applications. To mitigate this, researchers often resort to averaging results over multiple runs or using techniques like setting the “temperature” (the parameter that controls how much a model is allowed to deviate from the most probable outputs) to the minimum level. However, these approaches add complexity and computational overhead to the analysis process.
- (2)
- LLMs are black boxes. The inner workings of these models are not transparent, largely because the best-performing models are proprietary and closed-source. Even with open-source versions, the complexity of the model architectures—often consisting of billions of parameters—obscures the rationale behind specific outputs. This opacity makes it challenging for users to debug or improve the models based on specific application needs and hinders the ability to ensure that the models are making decisions for the right reasons. For example, if an LLM misclassifies the emotional tone of a product review, developers would struggle to pinpoint the reason without a clear understanding of the model’s decision-making process. By contrast, earlier models that would give individual words a single sentiment score were very easy to interpret.
- (3)
- LLM outputs are hard to evaluate. As a direct corollary of the point above, the lack of knowledge about the data on which the models were trained makes it difficult to evaluate their performance. It is challenging to determine whether they are outperforming previous models because they are more effective or because the data were already included in their extensive training data.
- (4)
- LLMs are biased. Commercial LLMs undergo an alignment process using reinforcement learning from human feedback (RHLF) to mitigate the biases of their training data, avoid hate speech and refuse to comply with unethical prompts. By trying to sanitize outputs to avoid generating harmful content, these models may become overly conservative, missing nuanced or contextually specific sentiments. It has indeed been noted that this process may be responsible for degrading LLMs’ performance on a variety of tasks, from hate speech detection to sentiment analysis [53]. But biases in LLMs are subtle and context-dependent, making them difficult to detect and correct comprehensively.
8. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Liu, B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Shouse, E. Feeling, emotion, affect. M/C J. 2005, 8. [Google Scholar] [CrossRef]
- Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86. [Google Scholar]
- Turney, P.D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv 2002, arXiv:cs/0212032. [Google Scholar]
- Issenberg, S. How Obama’s Team Used Big Data to Rally Voters. MIT Technology Review. 9 December 2012. Available online: https://www.technologyreview.com/2012/12/19/114510/how-obamas-team-used-big-data-to-rally-voters/ (accessed on 15 October 2024).
- Hu, Y. From Yulun (Public Opinion) to Yuqing (Public Intelligence): Their History and Practice in China’s Information Management. In The Routledge Companion to Global Internet Histories; Goggin, G., McLelland, M., Eds.; Routledge: London, UK, 2017; pp. 538–550. [Google Scholar]
- Hoffman, S. Engineering Global Consent: The Chinese Communist Party’s Data-Driven Power Expansion. Australian Strategic Policy Institute. 2019. Available online: https://ad-aspi.s3.ap-southeast-2.amazonaws.com/2019-10/Engineering%20global%20consent%20V2.pdf (accessed on 17 October 2024).
- Thorne, D. Evaluating the Utility of Global Data Collection by Chinese Firms for Targeted Propaganda. Jamestown Foundation. 2020. Available online: https://jamestown.org/program/evaluating-the-utility-of-global-data-collection-by-chinese-firms-for-targeted-propaganda (accessed on 17 October 2024).
- Patel, F.; Levinson-Waldman, R.; DenUyl, S.; Koreh, R. Social Media Monitoring: How the Department of Homeland Security Uses Digital Data in the Name of National Security. Brennan Center for Justice. 22 May 2019. Available online: https://www.brennancenter.org/sites/default/files/2019-08/Report_Social_Media_Monitoring.pdf (accessed on 17 October 2024).
- Wieshmann, H.; Davies, M.; Sugg, O.; Davis, S.; Ruda, S. Violence in London: What We Know and How to Respond. A Report Commissioned by the Mayor of London’s Violence Reduction Unit. Greater London Authority. 2020. Available online: https://images.london.gov.uk/m/2f62d5c4172448aa/original/Violence-in-London-what-we-know-and-how-to-respond.pdf (accessed on 17 October 2024).
- Gohdes, A.R. Repression Technology: Internet Accessibility and State Violence. Am. J. Politi. Sci. 2020, 64, 488–503. [Google Scholar] [CrossRef]
- AI4PublicPolicy. Project Information. AI4PublicPolicy. 2020. Available online: https://ai4publicpolicy.eu/project-info/ (accessed on 17 October 2024).
- Souri, N. Cutting-Edge WeGov Software Solution Supporting Policy-Makers in the Analysis of SNS. European Commission. Available online: https://joinup.ec.europa.eu/collection/eparticipation-and-evoting/news/cutting-edge-wegov-software-s (accessed on 17 October 2024).
- Feng, E. Why the Chinese Government Wants More Feel-Good Stories Posted Online. NPR. 10 January 2022. Available online: https://www.npr.org/2022/01/10/1071766938/why-the-chinese-government-wants-more-feel-good-stories-posted-online (accessed on 17 October 2024).
- Wang, J. Platform Responsibility with Chinese Characteristics. Digital Planet, Tufts University. 2022. Available online: https://digitalplanet.tufts.edu/wp-content/uploads/2023/02/DD-Report_1-Jufang-Wang-11.30.22.pdf (accessed on 17 October 2024).
- Shen, R.-P.; Liu, D.; Wei, X.; Zhang, M. Your posts betray you: Detecting influencer-generated sponsored posts by finding the right clues. Inf. Manag. 2022, 59, 103719. [Google Scholar] [CrossRef]
- Wu, Y.; Ngai, E.W.; Wu, P.; Wu, C. Fake online reviews: Literature review, synthesis, and directions for future research. Decis. Support Syst. 2020, 132, 113280. [Google Scholar] [CrossRef]
- Engle, R.F.; Ng, V.K. Measuring and Testing the Impact of News on Volatility. J. Financ. 1993, 48, 1749–1778. [Google Scholar] [CrossRef]
- Bloomberg. Finding Novel Ways to Trade on Sentiment Data. Bloomberg. 14 June 2017. Available online: https://www.bloomberg.com/company/stories/finding-novel-ways-trade-sentiment-data (accessed on 17 October 2024).
- Du, K.; Xing, F.; Mao, R.; Cambria, E. Financial Sentiment Analysis: Techniques and Applications. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
- Bucher, T.; Helmond, A. The affordances of social media platforms. SAGE Handb. Soc. Media 2018, 1, 233–254. [Google Scholar]
- Gaver, W.W. Technology affordances. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New Orleans, LO, USA, 27 April–2 May 1991; pp. 79–84. [Google Scholar]
- Gibson, J.J. The Ecological Approach to Visual Perception; Taylor & Francis: Abingdon, UK, 1979; pp. 119–135. [Google Scholar]
- Graves, L. The affordances of blogging: A case study in culture and technological effects. J. Commun. Inq. 2007, 31, 331–346. [Google Scholar] [CrossRef]
- Norman, D.A. Affordance, conventions, and design. Interactions 1999, 6, 38–43. [Google Scholar] [CrossRef]
- Ronzhyn, A.; Cardenal, A.S.; Rubio, A.B. Defining affordances in social media research: A literature review. New Media Soc. 2023, 25, 3165–3188. [Google Scholar] [CrossRef]
- Kaplan, A.M.; Haenlein, M. Users of the world, unite! The challenges and opportunities of Social Media. Bus. Horiz. 2010, 53, 59–68. [Google Scholar] [CrossRef]
- Zarrella, D. The Social Media Marketing Book; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Bechmann, A.; Lomborg, S. Mapping actor roles in social media: Different perspectives on value creation in theories of user participation. New Media Soc. 2013, 15, 765–781. [Google Scholar] [CrossRef]
- Carr, C.T.; Hayes, R.A. Social media: Defining, developing, and divining. Atl. J. Commun. 2015, 23, 46–65. [Google Scholar] [CrossRef]
- Hansen, D.; Shneiderman, B.; Smith, M.A. Analyzing Social Media Networks with NodeXL: Insights from a Connected World, 2nd ed.; Morgan Kaufmann: San Francisco, CA, USA, 2019. [Google Scholar]
- Hogan, B.; Quan-Haase, A. Persistence and Change in Social Media. Bull. Sci. Technol. Soc. 2010, 30, 309–315. [Google Scholar] [CrossRef]
- Howard, P.N.; Parks, M.R. Social Media and Political Change: Capacity, Constraint, and Consequence. J. Commun. 2012, 62, 359–362. [Google Scholar] [CrossRef]
- Kietzmann, J.H.; Hermkens, K.; McCarthy, I.P.; Silvestre, B.S. Social media? Get serious! Understanding the functional building blocks of social media. Bus. Horiz. 2011, 54, 241–251. [Google Scholar] [CrossRef]
- Lewis, B.K. Social media and strategic communication: Attitudes and perceptions among college students. Public Relat. J. 2010, 4, 1–23. [Google Scholar]
- Zhang, L.; Wang, S.; Liu, B. Deep Learning for Sentiment Analysis: A Survey. arXiv 2018. [Google Scholar] [CrossRef]
- Dashtipour, K.; Poria, S.; Hussain, A.; Cambria, E.; Hawalah, A.Y.A.; Gelbukh, A.; Zhou, Q. Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques. Cogn. Comput. 2016, 8, 757–771. [Google Scholar] [CrossRef]
- Agüero-Torales, M.M.; Salas, J.I.A.; López-Herrera, A.G. Deep learning and multilingual sentiment analysis on social media data: An overview. Appl. Soft Comput. 2021, 107, 107373. [Google Scholar] [CrossRef]
- Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowl. Data Eng. 2022, 35, 11019–11038. [Google Scholar] [CrossRef]
- Jang, H.; Rempel, E.; Roth, D.; Carenini, G.; Janjua, N.Z. Tracking COVID-19 Discourse on Twitter in North America: Infodemiology Study Using Topic Modeling and Aspect-Based Sentiment Analysis. J. Med Internet Res. 2021, 23, e25431. [Google Scholar] [CrossRef]
- Poria, S.; Cambria, E.; Bajpai, R.; Hussain, A. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion 2017, 37, 98–125. [Google Scholar] [CrossRef]
- Soleymani, M.; Garcia, D.; Jou, B.; Schuller, B.; Chang, S.-F.; Pantic, M. A survey of multimodal sentiment analysis. Image Vis. Comput. 2017, 65, 3–14. [Google Scholar] [CrossRef]
- Rosas, V.P.; Mihalcea, R.; Morency, L.-P. Multimodal Sentiment Analysis of Spanish Online Videos. IEEE Intell. Syst. 2013, 28, 38–45. [Google Scholar] [CrossRef]
- Yin, S.; Fu, C.; Zhao, S.; Li, K.; Sun, X.; Xu, T.; Chen, E. A Survey on Multimodal Large Language Models. arXiv 2024, arXiv:2306.13549. [Google Scholar]
- Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-BERT: Enabling Language Representation with Knowledge Graph. Proc. AAAI Conf. Artif. Intell. 2019, 34, 2901–2908. [Google Scholar] [CrossRef]
- Yang, L.; Chen, H.; Li, Z.; Ding, X.; Wu, X. Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling. IEEE Trans. Knowl. Data Eng. 2024, 99, 3091–3110. [Google Scholar] [CrossRef]
- Li, X.; Chan, S.; Zhu, X.; Pei, Y.; Ma, Z.; Liu, X.; Shah, S. Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks. arXiv 2023, arXiv:2305.05862. [Google Scholar]
- Deng, X.; Bashlovkina, V.; Han, F.; Baumgartner, S.; Bendersky, M. LLMs to the Moon? Reddit Market Sentiment Analysis with Large Language Models. In Proceedings of the Companion Proceedings of the ACM Web Conference, Austin, TX, USA, 30 April–4 May 2023; pp. 1014–1019. [Google Scholar]
- Wang, Z.; Xie, Q.; Feng, Y.; Ding, Z.; Yang, Z.; Xia, R. Is ChatGPT a good sentiment analyzer? A preliminary study. arXiv 2024, arXiv:2304.04339. [Google Scholar]
- Zhong, Q.; Ding, L.; Liu, J.; Du, B.; Tao, D. Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT. arXiv 2023, arXiv:2302.10198. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2022, arXiv:2201.11903. [Google Scholar]
- Xing, F. Designing Heterogeneous LLM Agents for Financial Sentiment Analysis. arXiv 2024, arXiv:2401.05799. [Google Scholar] [CrossRef]
- Zhang, W.; Deng, Y.; Liu, B.; Pan, S.; Bing, L. Sentiment Analysis in the Era of Large Language Models: A Reality Check. arXiv 2023, arXiv:2305.15005. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nip, J.Y.M.; Berthelier, B. Social Media Sentiment Analysis. Encyclopedia 2024, 4, 1590-1598. https://doi.org/10.3390/encyclopedia4040104
Nip JYM, Berthelier B. Social Media Sentiment Analysis. Encyclopedia. 2024; 4(4):1590-1598. https://doi.org/10.3390/encyclopedia4040104
Chicago/Turabian StyleNip, Joyce Y. M., and Benoit Berthelier. 2024. "Social Media Sentiment Analysis" Encyclopedia 4, no. 4: 1590-1598. https://doi.org/10.3390/encyclopedia4040104
APA StyleNip, J. Y. M., & Berthelier, B. (2024). Social Media Sentiment Analysis. Encyclopedia, 4(4), 1590-1598. https://doi.org/10.3390/encyclopedia4040104