MDPI - Publisher of Open Access Journals

20 pages, 7065 KiB

Open AccessArticle

Application of Machine Learning Techniques to Classify Twitter Sentiments Using Vectorization Techniques

by Manjog Padhy, Umar Muhammad Modibbo, Rasmita Rautray, Subhranshu Sekhar Tripathy and Sujit Bebortta

Algorithms 2024, 17(11), 486; https://doi.org/10.3390/a17110486 - 29 Oct 2024

Cited by 1 | Viewed by 1973

Abstract

The advancements in social networking have empowered open expression on micro-blogging platforms like Twitter. Traditional Twitter Sentiment Analysis (TSA) faces challenges due to rule-based or dictionary algorithms, dealing with feature selection, ambiguity, sparse data, and language variations. This study proposed a classification framework [...] Read more.

The advancements in social networking have empowered open expression on micro-blogging platforms like Twitter. Traditional Twitter Sentiment Analysis (TSA) faces challenges due to rule-based or dictionary algorithms, dealing with feature selection, ambiguity, sparse data, and language variations. This study proposed a classification framework for Twitter sentiment data using word count vectorization and machine learning techniques to reduce the difficulties faced with annotated sentiment-labelled tweets. Various classifiers (Naïve Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)) were evaluated based on accuracy, precision, recall, F1-score, and specificity. Random Forest outperformed the others with an Area under Curve (AUC) value of 0.96 and an average precision (AP) score of 0.96 in sentiment classification, especially effective with minimal Twitter-specific features. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

21 pages, 438 KiB

Open AccessArticle

FinSoSent: Advancing Financial Market Sentiment Analysis through Pretrained Large Language Models

by Josiel Delgadillo, Johnson Kinyua and Charles Mutigwe

Big Data Cogn. Comput. 2024, 8(8), 87; https://doi.org/10.3390/bdcc8080087 - 2 Aug 2024

Cited by 8 | Viewed by 4919

Abstract

Predicting the directions of financial markets has been performed using a variety of approaches, and the large volume of unstructured data generated by traders and other stakeholders on social media microblog platforms provides unique opportunities for analyzing financial markets using additional perspectives. Pretrained [...] Read more.

Predicting the directions of financial markets has been performed using a variety of approaches, and the large volume of unstructured data generated by traders and other stakeholders on social media microblog platforms provides unique opportunities for analyzing financial markets using additional perspectives. Pretrained large language models (LLMs) have demonstrated very good performance on a variety of sentiment analysis tasks in different domains. However, it is known that sentiment analysis is a very domain-dependent NLP task that requires knowledge of the domain ontology, and this is particularly the case with the financial domain, which uses its own unique vocabulary. Recent developments in NLP and deep learning including LLMs have made it possible to generate actionable financial sentiments using multiple sources including financial news, company fundamentals, technical indicators, as well social media microblogs posted on platforms such as StockTwits and X (formerly Twitter). We developed a financial social media sentiment analyzer (FinSoSent), which is a domain-specific large language model for the financial domain that was pretrained on financial news articles and fine-tuned and tested using several financial social media corpora. We conducted a large number of experiments using different learning rates, epochs, and batch sizes to yield the best performing model. Our model outperforms current state-of-the-art FSA models based on over 860 experiments, demonstrating the efficacy and effectiveness of FinSoSent. We also conducted experiments using ensemble models comprising FinSoSent and the other current state-of-the-art FSA models used in this research, and a slight performance improvement was obtained based on majority voting. Based on the results obtained across all models in these experiments, the significance of this study is that it highlights the fact that, despite the recent advances of LLMs, sentiment analysis even in domain-specific contexts remains a difficult research problem. Full article

► Show Figures

Figure 1

32 pages, 1418 KiB

Open AccessArticle

From Research to Retweets—Exploring the Role of Educational Twitter (X) Communities in Promoting Science Communication and Evidence-Based Teaching

by Monica Déchène, Kaley Lesperance, Lisa Ziernwald and Doris Holzberger

Educ. Sci. 2024, 14(2), 196; https://doi.org/10.3390/educsci14020196 - 15 Feb 2024

Cited by 5 | Viewed by 5545

Abstract

Twitter has evolved from its initial purpose as a microblogging social network to a pivotal platform for science communication. Equally, it has gained significant popularity among teachers who utilize communities like the German #twitterlehrerzimmer (TWLZ; Twitter teachers’ lounge) as a digital professional learning [...] Read more.

Twitter has evolved from its initial purpose as a microblogging social network to a pivotal platform for science communication. Equally, it has gained significant popularity among teachers who utilize communities like the German #twitterlehrerzimmer (TWLZ; Twitter teachers’ lounge) as a digital professional learning network. (1) Background: To date, no studies examine how science communication is conducted on Twitter specifically tailored to teachers’ needs and whether this facilitates evidence-based teaching. (2) Methods: Answering the three research questions involved a comprehensive mixed methods approach comprising an online teacher survey, utility analysis using Analytical Hierarchy Process (AHP) models, and machine learning-assisted tweet analyses. (3) Results: Teachers implement research findings from the TWLZ in their teaching about twice a month. They prefer interactive tweets with specific content-related, communicative, and interactive tweet features. Science communication in the TWLZ differs from everyday communication but notably emphasizes the relevance of transfer events for educational practice. (4) Conclusions: Findings highlight that dialogue is essential for successful science communication. Practical implications arise from new guidelines on how research findings should be communicated and encourage teachers to reflect on their Twitter usage and attitude toward evidence-based teaching. Recommendations for further research in this emerging field are also discussed. Full article

(This article belongs to the Special Issue Science Communication in Education: Mapping the Field to Foster the Impact and Sustainability of Education Sciences)

► Show Figures

Figure 1

18 pages, 876 KiB

Open AccessArticle

Unraveling Microblog Sentiment Dynamics: A Twitter Public Attitudes Analysis towards COVID-19 Cases and Deaths

by Paraskevas Koukaras, Dimitrios Rousidis and Christos Tjortjis

Informatics 2023, 10(4), 88; https://doi.org/10.3390/informatics10040088 - 7 Dec 2023

Cited by 1 | Viewed by 2479

Abstract

The identification and analysis of sentiment polarity in microblog data has drawn increased attention. Researchers and practitioners attempt to extract knowledge by evaluating public sentiment in response to global events. This study aimed to evaluate public attitudes towards the spread of COVID-19 by [...] Read more.

The identification and analysis of sentiment polarity in microblog data has drawn increased attention. Researchers and practitioners attempt to extract knowledge by evaluating public sentiment in response to global events. This study aimed to evaluate public attitudes towards the spread of COVID-19 by performing sentiment analysis on over 2.1 million tweets in English. The implications included the generation of insights for timely disease outbreak prediction and assertions regarding worldwide events, which can help policymakers take suitable actions. We investigated whether there was a correlation between public sentiment and the number of cases and deaths attributed to COVID-19. The research design integrated text preprocessing (regular expression operations, (de)tokenization, stopwords), sentiment polarization analysis via TextBlob, hypothesis formulation (null hypothesis testing), and statistical analysis (Pearson coefficient and p-value) to produce the results. The key findings highlight a correlation between sentiment polarity and deaths, starting at 41 days before and expanding up to 3 days after counting. Twitter users reacted to increased numbers of COVID-19-related deaths after four days by posting tweets with fading sentiment polarization. We also detected a strong correlation between COVID-19 Twitter conversation polarity and reported cases and a weak correlation between polarity and reported deaths. Full article

► Show Figures

Figure 1

24 pages, 10136 KiB

Open AccessArticle

An Analysis of the Evolution of Public Sentiment and Spatio-Temporal Dynamics Regarding Building Collapse Accidents Based on Sina Weibo Data

by Dongling Ma, Chunhong Zhang, Liang Zhao, Qingji Huang and Baoze Liu

ISPRS Int. J. Geo-Inf. 2023, 12(10), 388; https://doi.org/10.3390/ijgi12100388 - 26 Sep 2023

Cited by 10 | Viewed by 2643

Abstract

Monitoring, analyzing, and managing public sentiment surrounding urban emergencies hold significant importance for city governments in executing effective response strategies and maintaining social stability. In this study, we present a study which was conducted regarding the self-built house collapse incident in Changsha, China, [...] Read more.

Monitoring, analyzing, and managing public sentiment surrounding urban emergencies hold significant importance for city governments in executing effective response strategies and maintaining social stability. In this study, we present a study which was conducted regarding the self-built house collapse incident in Changsha, China, that occurred on 29 April 2022, with a focus on leveraging Sina Weibo (a Twitter-like microblogging system in China) comment data. By employing the Latent Dirichlet Allocation (LDA) topic model, we identified key discussion themes within the comments and explored the emotional and spatio-temporal characteristics of the discourse. Furthermore, utilizing geographic detectors, we investigated the factors influencing the spatial variations in comment data. Our research findings indicate that the comments can be categorized into three main themes: “Rest in Peace for the Deceased”, “Wishing for Safety”, and “Thorough Investigation of Self-Built Houses”. Regarding emotional features, the overall sentiment expressed in the public discourse displayed positivity, albeit with significant fluctuations during different stages of the incident, including the initial occurrence, rescue efforts, and the establishment of accountability and investigative committees. These fluctuations were closely associated with the emotional polarity of the specific topics. In terms of temporal distribution, the peak in the number of comments occurred approximately one hour after the topic was published. Concerning spatial distribution, a positive sentiment prevailed across various provinces. The comment distribution exhibited a stair-like pattern, which correlated with interregional population migration and per capita GDP. Our study provides valuable insights for city governments and relevant departments in conducting sentiment analysis and guiding public opinion trends. Full article

(This article belongs to the Special Issue Human-Induced Disaster and Conflict Analysis, Prediction, and Prevention by Geospatial Analytics and Information Systems)

► Show Figures

Figure 1

9 pages, 2690 KiB

Open AccessArticle

Social Media Network Analysis of Academic Urologists’ Interaction within Twitter Microblogging Environment

by Spencer H. Bell, Clara Sun, Emma Helstrom, Justin M. Dubin, Ilaha Isali, Kirtishri Mishra, Andrew Gianakopoulos, Seyed Behzad Jazayeri, Mohit Sindhani, Lee Ponsky, Alexander Kutikov, Casey Seideman, Andres Correa, Diana Magee and Laura Bukavina

Soc. Int. Urol. J. 2023, 4(2), 96-104; https://doi.org/10.48083/TKEK6928 - 16 Mar 2023

Cited by 2 | Viewed by 471

Abstract

Objective: To characterize academic urology Twitter presence and interaction by subspecialty designation. Methods: Using Twitter application programming interface of available data, 94 000 specific tweets were extracted for the analysis through the Twitter Developer Program. Academic urologists were defined based on [...] Read more.

Objective: To characterize academic urology Twitter presence and interaction by subspecialty designation. Methods: Using Twitter application programming interface of available data, 94 000 specific tweets were extracted for the analysis through the Twitter Developer Program. Academic urologists were defined based on American Urological Association (AUA) residency program registration of 143 residency programs, with a total of 2377 faculty. Two of 3-factor verification (name, location, specialty) of faculty Twitter account was used. Additional faculty information including sex, program location, and subspecialty were manually recorded. All elements of microblogging were captured through Anaconda Navigator. Analyzed tweets were further evaluated using natural language processing for sentiment association, mentions, and quote tweeted and retweeted. Network analysis based on interactions of academic urologist within specialty for given topic were analyzed using D3 in JavaScript. Analysis was performed in Python and R. Results: We identified 143 residency programs with a total of 2377 faculty (1975 men and 402 women). Among all faculty, 945 (39.7%) had registered Twitter accounts, with the majority being men (759 [80.40%] versus 185 [19.60%]). Although there were more male academic urologists across programs, women within academic urology were more likely to have a registered Twitter account overall (46% versus 38.5%) compared with men. When assessing registered accounts by sex, there was a peak for male faculty in 2014 (10.05% of all accounts registered) and peak for female faculty in 2015 (2.65%). There was no notable change in faculty account registration during COVID-19 (2019–2020). In 2022, oncology represented the highest total number of registered Twitter users (225), with the highest number of total tweets (24 622), followers (138 541), and tweets per user per day (0.32). However, andrology (50%) and reconstruction (51.3%) were 2 of the highest proportionally represented subspecialties within academic urology. Within the context of conversation surrounding a specified topic (#aua21), female pelvic medicine and reconstructive surgery (FPMRS) and endourology demonstrated the total highest number of intersubspecialty conversations. Conclusions: There is a steady increase in Twitter representation among academic urologists, largely unaffected by COVID-19. While urologic oncology represents the largest group, andrology and reconstructive urology represent the highest proportion of their respective subspecialties. Interaction analysis highlights the variant interaction among subspecialties based on topic, with strong direct ties between endourology, FPMRS, and oncology. Full article

► Show Figures

Figure 1

14 pages, 1434 KiB

Open AccessArticle

Predicting Location of Tweets Using Machine Learning Approaches

by Mohammed Alsaqer, Salem Alelyani, Mohamed Mohana, Khalid Alreemy and Ali Alqahtani

Appl. Sci. 2023, 13(5), 3025; https://doi.org/10.3390/app13053025 - 26 Feb 2023

Cited by 10 | Viewed by 4008

Abstract

Twitter, one of the most popular microblogging platforms, has tens of millions of active users worldwide, generating hundreds of millions of posts every day. Twitter posts, referred to as “tweets”, the short and the noisy text, bring many challenges with them, such as [...] Read more.

Twitter, one of the most popular microblogging platforms, has tens of millions of active users worldwide, generating hundreds of millions of posts every day. Twitter posts, referred to as “tweets”, the short and the noisy text, bring many challenges with them, such as in the case of some emergency or disaster. Predicting the location of these tweets is important for social, security, human rights, and business reasons and has raised noteworthy consideration lately. However, most Twitter users disable the geo-tagging feature, and their home locations are neither standardized nor accurate. In this study, we applied four machine learning techniques named Logistic Regression, Random Forest, Multinomial Naïve Bayes, and Support Vector Machine with and without the utilization of the geo-distance matrix for location prediction of a tweet using its textual content. Our extensive experiments on our vast collection of Arabic tweets From Saudi Arabia with different feature sets yielded promising results with 67% accuracy. Full article

(This article belongs to the Special Issue Text Mining, Machine Learning, and Natural Language Processing)

► Show Figures

Figure 1

16 pages, 2457 KiB

Open AccessArticle

Crowd Control, Planning, and Prediction Using Sentiment Analysis: An Alert System for City Authorities

by Tariq Malik, Najma Hanif, Ahsen Tahir, Safeer Abbas, Muhammad Shoaib Hanif, Faiza Tariq, Shuja Ansari, Qammer Hussain Abbasi and Muhammad Ali Imran

Appl. Sci. 2023, 13(3), 1592; https://doi.org/10.3390/app13031592 - 26 Jan 2023

Cited by 2 | Viewed by 4253

Abstract

Modern means of communication, economic crises, and political decisions play imperative roles in reshaping political and administrative systems throughout the world. Twitter, a micro-blogging website, has gained paramount importance in terms of public opinion-sharing. Manual intelligence of law enforcement agencies (i.e., in changing [...] Read more.

Modern means of communication, economic crises, and political decisions play imperative roles in reshaping political and administrative systems throughout the world. Twitter, a micro-blogging website, has gained paramount importance in terms of public opinion-sharing. Manual intelligence of law enforcement agencies (i.e., in changing situations) cannot cope in real time. Thus, to address this problem, we built an alert system for government authorities in the province of Punjab, Pakistan. The alert system gathers real-time data from Twitter in English and Roman Urdu about forthcoming gatherings (protests, demonstrations, assemblies, rallies, sit-ins, marches, etc.). To determine public sentiment regarding upcoming anti-government gatherings (protests, demonstrations, assemblies, rallies, sit-ins, marches, etc.), the alert system determines the polarity of tweets. Using keywords, the system provides information for future gatherings by extracting the entities like date, time, and location from Twitter data obtained in real time. Our system was trained and tested with different machine learning (ML) algorithms, such as random forest (RF), decision tree (DT), support vector machine (SVM), multinomial naïve Bayes (MNB), and Gaussian naïve Bayes (GNB), along with two vectorization techniques, i.e., term frequency–inverse document frequency (TFIDF) and count vectorization. Moreover, this paper compares the accuracy results of sentiment analysis (SA) of Twitter data by applying supervised machine learning (ML) algorithms. In our research experiment, we used two data sets, i.e., a small data set of 1000 tweets and a large data set of 4000 tweets. Results showed that RF along with count vectorization performed best for the small data set with an accuracy of 82%; with the large data set, MNB along with count vectorization outperformed all other classifiers with an accuracy of 75%. Additionally, language models, e.g., bigram and trigram, were used to generate the word clouds of positive and negative words to visualize the most frequently used words. Full article

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

► Show Figures

Figure 1

14 pages, 2290 KiB

Open AccessArticle

A BERT Framework to Sentiment Analysis of Tweets

by Abayomi Bello, Sin-Chun Ng and Man-Fai Leung

Sensors 2023, 23(1), 506; https://doi.org/10.3390/s23010506 - 2 Jan 2023

Cited by 135 | Viewed by 16775

Abstract

Sentiment analysis has been widely used in microblogging sites such as Twitter in recent decades, where millions of users express their opinions and thoughts because of its short and simple manner of expression. Several studies reveal the state of sentiment which does not [...] Read more.

Sentiment analysis has been widely used in microblogging sites such as Twitter in recent decades, where millions of users express their opinions and thoughts because of its short and simple manner of expression. Several studies reveal the state of sentiment which does not express sentiment based on the user context because of different lengths and ambiguous emotional information. Hence, this study proposes text classification with the use of bidirectional encoder representations from transformers (BERT) for natural language processing with other variants. The experimental findings demonstrate that the combination of BERT with CNN, BERT with RNN, and BERT with BiLSTM performs well in terms of accuracy rate, precision rate, recall rate, and F1-score compared to when it was used with Word2vec and when it was used with no variant. Full article

(This article belongs to the Special Issue Current Trends and Practices in Smart Health Monitoring)

► Show Figures

Figure 1

18 pages, 343 KiB

Open AccessArticle

Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis

by Manon Reusens, Michael Reusens, Marc Callens, Seppe vanden Broucke and Bart Baesens

Analytics 2022, 1(2), 117-134; https://doi.org/10.3390/analytics1020009 - 18 Oct 2022

Cited by 6 | Viewed by 3620

Abstract

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment [...] Read more.

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains. Full article

► Show Figures

Figure 1

15 pages, 2576 KiB

Open AccessArticle

Exploring the Factors Associated with Mental Health Attitude in China: A Structural Topic Modeling Approach

by Ruheng Yin, Rui Tian, Jing Wu and Feng Gan

Int. J. Environ. Res. Public Health 2022, 19(19), 12579; https://doi.org/10.3390/ijerph191912579 - 1 Oct 2022

Cited by 7 | Viewed by 2983

Abstract

Mental health attitude has huge impacts on the improvement of mental health. In response to the ongoing damage the COVID-19 pandemic caused to the mental health of the Chinese people, this study aims to explore the factors associated with mental health attitude in [...] Read more.

Mental health attitude has huge impacts on the improvement of mental health. In response to the ongoing damage the COVID-19 pandemic caused to the mental health of the Chinese people, this study aims to explore the factors associated with mental health attitude in China. To this end, we extract the key topics in mental health-related microblogs on Weibo, the Chinese equivalent of Twitter, using the structural topic modeling (STM) approach. An interaction term of sentiment polarity and time is put into the STM model to track the evolution of public sentiment towards the key topics over time. Through an in-depth analysis of 146,625 Weibo posts, this study captures 12 topics that are, in turn, classified into four factors as stigma (n = 54,559, 37.21%), mental health literacy (n = 32,199, 21.96%), public promotion (n = 30,747, 20.97%), and social support (n = 29,120, 19.86%). The results show that stigma is the primary factor inducing negative mental health attitudes in China as none of the topics related to this factor are considered positive. Mental health literacy, public promotion, and social support are the factors that could enhance positive attitudes towards mental health, since most of the topics related to these factors are identified as positive ones. The provision of tailored strategies for each of these factors could potentially improve the mental health attitudes of the Chinese people. Full article

(This article belongs to the Special Issue Digital Technologies for Public Health Promotion)

► Show Figures

Figure 1

21 pages, 716 KiB

Open AccessArticle

Towards Sentiment Analysis for Romanian Twitter Content

by Dan Claudiu Neagu, Andrei Bogdan Rus, Mihai Grec, Mihai Augustin Boroianu, Nicolae Bogdan and Attila Gal

Algorithms 2022, 15(10), 357; https://doi.org/10.3390/a15100357 - 28 Sep 2022

Cited by 6 | Viewed by 3573

Abstract

With the increased popularity of social media platforms such as Twitter or Facebook, sentiment analysis (SA) over the microblogging content becomes of crucial importance. The literature reports good results for well-resourced languages such as English, Spanish or German, but open research space still [...] Read more.

With the increased popularity of social media platforms such as Twitter or Facebook, sentiment analysis (SA) over the microblogging content becomes of crucial importance. The literature reports good results for well-resourced languages such as English, Spanish or German, but open research space still exists for underrepresented languages such as Romanian, where there is a lack of public training datasets or pretrained word embeddings. The majority of research on Romanian SA tackles the issue in a binary classification manner (positive vs. negative), using a single public dataset which consists of product reviews. In this paper, we respond to the need for a media surveillance project to possess a custom multinomial SA classifier for usage in a restrictive and specific production setup. We describe in detail how such a classifier was built, with the help of an English dataset (containing around

15, 000

tweets) translated to Romanian with a public translation service. We test the most popular classification methods that could be applied to SA, including standard machine learning, deep learning and BERT. As we could not find any results for multinomial sentiment classification (positive, negative and neutral) in Romanian, we set two benchmark accuracies of ≈78% using standard machine learning and ≈81% using BERT. Furthermore, we demonstrate that the automatic translation service does not downgrade the learning performance by comparing the accuracies achieved by the models trained on the original dataset with the models trained on the translated data. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

27 pages, 21431 KiB

Open AccessArticle

Robust Sentimental Class Prediction Based on Cryptocurrency-Related Tweets Using Tetrad of Feature Selection Techniques in Combination with Filtered Classifier

by Saad Awadh Alanazi

Appl. Sci. 2022, 12(12), 6070; https://doi.org/10.3390/app12126070 - 15 Jun 2022

Cited by 2 | Viewed by 2559

Abstract

Individual mental feelings and reactions are getting more significant as they help researchers, domain experts, businesses, companies, and other individuals understand the overall response of every individual in specific situations or circumstances. Every pure and compound sentiment can be classified using a dataset, [...] Read more.

Individual mental feelings and reactions are getting more significant as they help researchers, domain experts, businesses, companies, and other individuals understand the overall response of every individual in specific situations or circumstances. Every pure and compound sentiment can be classified using a dataset, which can be in the form of Twitter text by various Twitter users. Twitter is one of the vital platforms for individuals to participate and share their ideas about different topics; it is also considered to be one of the most famous and the biggest website for micro-blogging on the Internet. One of the key purposes of this study is to classify pure and compound sentiments based on text related to cryptocurrencies, an innovative way of trading and flourishing daily. The cryptocurrency market incurs many fluctuations in the coins’ value. A small positive or negative piece of news can sensate the whole scenario about the specific cryptocurrencies. In this paper, individuals’ pure and compound sentiments based on cryptocurrency-related Twitter text are classified. The dataset is collected through the Twitter API. In WEKA, the two deployment schemes are compared; firstly, straight with single feature selection technique (Tweet to lexicon feature vector), and secondly, a tetrad of feature selection techniques (Tweet to lexicon feature vector, Tweet to input lexicon feature vector, Tweet to SentiStrength feature vector, and Tweet to embedding feature vector) are used to purify the data LibLINEAR (LL) classifier, which contains fast algorithms for linear classification using L2-regularization L2-loss support vector machines (Dual SVM). The LL classifier differs in that it can potentially alleviate the sum of the absolute values of errors rather than the sum of the squared errors and is typically much speedier. Based on the overall performance parameters, the deployment scheme containing the tetrad of feature selection techniques with the LL classifier is considered the best choice for the purpose of classification. Among machine learning techniques, LL produces effective results and gives an efficient performance compared to other prevailing techniques. The findings of this research would be beneficial for Twitter users as well as cryptocurrency traders. Full article

(This article belongs to the Special Issue Natural Language Processing: Recent Development and Applications)

► Show Figures

Figure 1

21 pages, 2034 KiB

Open AccessArticle

Stock Market Prediction Using Microblogging Sentiment Analysis and Machine Learning

by Paraskevas Koukaras, Christina Nousi and Christos Tjortjis

Telecom 2022, 3(2), 358-378; https://doi.org/10.3390/telecom3020019 - 27 May 2022

Cited by 62 | Viewed by 16498

Abstract

The use of Machine Learning (ML) and Sentiment Analysis (SA) on data from microblogging sites has become a popular method for stock market prediction. In this work, we developed a model for predicting stock movement utilizing SA on Twitter and StockTwits data. Stock [...] Read more.

The use of Machine Learning (ML) and Sentiment Analysis (SA) on data from microblogging sites has become a popular method for stock market prediction. In this work, we developed a model for predicting stock movement utilizing SA on Twitter and StockTwits data. Stock movement and sentiment data were used to evaluate this approach and validate it on Microsoft stock. We gathered tweets from Twitter and StockTwits, as well as financial data from Finance Yahoo. SA was applied to tweets, and seven ML classification models were implemented: K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF) and Multilayer Perceptron (MLP). The main novelty of this work is that it integrates multiple SA and ML methods, emphasizing the retrieval of extra features from social media (i.e., public sentiment), for improving stock prediction accuracy. The best results were obtained when tweets were analyzed using Valence Aware Dictionary and sEntiment Reasoner (VADER) and SVM. The top F-score was 76.3%, while the top Area Under Curve (AUC) value was 67%. Full article

(This article belongs to the Special Issue Selected Papers from SEEDA-CECNSM 2021)

► Show Figures

Figure 1

14 pages, 577 KiB

Open AccessArticle

The Predictive Power of a Twitter User’s Profile on Cryptocurrency Popularity

by Maria Trigka, Andreas Kanavos, Elias Dritsas, Gerasimos Vonitsanos and Phivos Mylonas

Big Data Cogn. Comput. 2022, 6(2), 59; https://doi.org/10.3390/bdcc6020059 - 20 May 2022

Cited by 10 | Viewed by 5079

Abstract

Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin [...] Read more.

Microblogging has become an extremely popular communication tool among Internet users worldwide. Millions of users daily share a huge amount of information related to various aspects of their lives, which makes the respective sites a very important source of data for analysis. Bitcoin (BTC) is a decentralized cryptographic currency and is equivalent to most recurrently known currencies in the way that it is influenced by socially developed conclusions, regardless of whether those conclusions are considered valid. This work aims to assess the importance of Twitter users’ profiles in predicting a cryptocurrency’s popularity. More specifically, our analysis focused on the user influence, captured by different Twitter features (such as the number of followers, retweets, lists) and tweet sentiment scores as the main components of measuring popularity. Moreover, the Spearman, Pearson, and Kendall Correlation Coefficients are applied as post-hoc procedures to support hypotheses about the correlation between a user influence and the aforementioned features. Tweets sentiment scoring (as positive or negative) was performed with the aid of Valence Aware Dictionary and Sentiment Reasoner (VADER) for a number of tweets fetched within a concrete time period. Finally, the Granger causality test was employed to evaluate the statistical significance of various features time series in popularity prediction to identify the most influential variable for predicting future values of the cryptocurrency popularity. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI