MDPI - Publisher of Open Access Journals

41 pages, 1205 KB

Open AccessArticle

A Novel Framework for Evaluating Polarization in Online Social Networks

by Christopher Buratti, Michele Marchetti, Federica Parlapiano, Domenico Ursino and Luca Virgili

Big Data Cogn. Comput. 2025, 9(9), 227; https://doi.org/10.3390/bdcc9090227 - 1 Sep 2025

Viewed by 260

In online communities, polarization refers to the phenomenon in which individuals become more divided and extreme in their opinions due to their exposure to specific content. In this paper, we present a network-based framework for evaluating polarization levels in Online Social Networks (OSNs). [...] Read more.

In online communities, polarization refers to the phenomenon in which individuals become more divided and extreme in their opinions due to their exposure to specific content. In this paper, we present a network-based framework for evaluating polarization levels in Online Social Networks (OSNs). Starting from a dataset of comments, our framework creates a network of user interactions and leverages the Louvain algorithm, the Rao’s Quadratic Entropy, and ego networks to assess the polarization level of communities and the most influential users. To test our framework, we leveraged a dataset of tweets about climate change. After performing Extraction, Transformation and Loading activities on the dataset, we evaluated its labels, identified communities, and analyzed their polarization level and that of the most influential users. We also analyzed the ego networks of believers and deniers and the aggressiveness of the corresponding tweets. Our analysis revealed the existence of polarized communities and homophily among the most influential users. It also showed that the type of communication used to disseminate information influences the polarization level of both communities and individual users. These results demonstrate our framework’s ability to support the polarization analysis in OSNs. Full article

(This article belongs to the Special Issue Advances in Complex Networks)

► Show Figures

Figure 1

21 pages, 2655 KB

Open AccessArticle

A Hybrid Approach for Geo-Referencing Tweets: Transformer Language Model Regression and Gazetteer Disambiguation

by Thomas Edwards, Padraig Corcoran and Christopher B. Jones

ISPRS Int. J. Geo-Inf. 2025, 14(9), 321; https://doi.org/10.3390/ijgi14090321 - 22 Aug 2025

Viewed by 528

Abstract

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. [...] Read more.

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. However, obtaining significant volumes of geo-referenced data from Twitter, recently renamed X, can be difficult. Further, existing language modelling approaches can require the division of a given area into a grid or set of clusters, which can be dataset-specific and challenging for location prediction at a fine-grained level. Regression-based approaches in combination with deep learning address some of these challenges as they can assign coordinates directly without the need for clustering or grid-based methods. However, such approaches have received only limited attention for the geo-referencing task. In this paper, we adapt state-of-the-art neural network models for the regression task, focusing on geo-referencing wildlife Tweets where there is a limited amount of data. We experiment with different transfer learning techniques for improving the performance of the regression models, and we also compare our approach to recently developed Large Language Models and prompting techniques. We show that using a location names extraction method in combination with regression-based disambiguation, and purely regression when names are absent, leads to significant improvements in locational accuracy over using only regression. Full article

► Show Figures

Figure 1

20 pages, 1925 KB

Open AccessArticle

Beyond Polarity: Forecasting Consumer Sentiment with Aspect- and Topic-Conditioned Time Series Models

by Mian Usman Sattar, Raza Hasan, Sellappan Palaniappan, Salman Mahmood and Hamza Wazir Khan

Information 2025, 16(8), 670; https://doi.org/10.3390/info16080670 - 6 Aug 2025

Viewed by 597

Abstract

Existing approaches to social media sentiment analysis typically focus on static classification, offering limited foresight into how public opinion evolves. This study addresses that gap by introducing the Multi-Feature Sentiment-Driven Forecasting (MFSF) framework, a novel pipeline that enhances sentiment trend prediction by integrating [...] Read more.

Existing approaches to social media sentiment analysis typically focus on static classification, offering limited foresight into how public opinion evolves. This study addresses that gap by introducing the Multi-Feature Sentiment-Driven Forecasting (MFSF) framework, a novel pipeline that enhances sentiment trend prediction by integrating rich contextual information from text. Using state-of-the-art transformer models on the Sentiment140 dataset, our framework extracts three concurrent signals from each tweet: sentiment polarity, aspect-based scores (e.g., ‘price’ and ‘service’), and topic embeddings. These features are aggregated into a daily multivariate time series. We then employ a SARIMAX model to forecast future sentiment, using the extracted aspect and topic data as predictive exogenous variables. Our results, validated on the historical Sentiment140 Twitter dataset, demonstrate the framework’s superior performance. The proposed multivariate model achieved a 26.6% improvement in forecasting accuracy (RMSE) over a traditional univariate ARIMA baseline. The analysis confirmed that conversational aspects like ‘service’ and ‘quality’ are statistically significant predictors of future sentiment. By leveraging the contextual drivers of conversation, the MFSF framework provides a more accurate and interpretable tool for businesses and policymakers to proactively monitor and anticipate shifts in public opinion. Full article

(This article belongs to the Special Issue Semantic Networks for Social Media and Policy Insights)

► Show Figures

Figure 1

22 pages, 5188 KB

Open AccessArticle

LCDAN: Label Confusion Domain Adversarial Network for Information Detection in Public Health Events

by Qiaolin Ye, Guoxuan Sun, Yanwen Chen and Xukan Xu

Electronics 2025, 14(15), 3102; https://doi.org/10.3390/electronics14153102 - 4 Aug 2025

Viewed by 379

Abstract

With the popularization of social media, information related to public health events has seen explosive growth online, making it essential to accurately identify informative tweets with decision-making and management value for public health emergency response and risk monitoring. However, existing methods often suffer [...] Read more.

With the popularization of social media, information related to public health events has seen explosive growth online, making it essential to accurately identify informative tweets with decision-making and management value for public health emergency response and risk monitoring. However, existing methods often suffer performance degradation during cross-event transfer due to differences in data distribution, and research specifically targeting public health events remains limited. To address this, we propose the Label Confusion Domain Adversarial Network (LCDAN), which innovatively integrates label confusion with domain adaptation to enhance the detection of informative tweets across different public health events. First, LCDAN employs an adversarial domain adaptation model to learn cross-domain feature representation. Second, it dynamically evaluates the importance of different source domain samples to the target domain through label confusion to optimize the migration effect. Experiments were conducted on datasets related to COVID-19, Ebola disease, and Middle East Respiratory Syndrome public health events. The results demonstrate that LCDAN significantly outperforms existing methods across all tasks. This research provides an effective tool for information detection during public health emergencies, with substantial theoretical and practical implications. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 7359 KB

Open AccessArticle

An Aspect-Based Emotion Analysis Approach on Wildfire-Related Geo-Social Media Data—A Case Study of the 2020 California Wildfires

by Christina Zorenböhmer, Shaily Gandhi, Sebastian Schmidt and Bernd Resch

ISPRS Int. J. Geo-Inf. 2025, 14(8), 301; https://doi.org/10.3390/ijgi14080301 - 1 Aug 2025

Viewed by 548

Abstract

Natural disasters like wildfires pose significant threats to communities, which necessitates timely and effective disaster response strategies. While Aspect-based Sentiment Analysis (ABSA) has been widely used to extract sentiment-related information at the sub-sentence level, the corresponding field of Aspect-based Emotion Analysis (ABEA) remains [...] Read more.

Natural disasters like wildfires pose significant threats to communities, which necessitates timely and effective disaster response strategies. While Aspect-based Sentiment Analysis (ABSA) has been widely used to extract sentiment-related information at the sub-sentence level, the corresponding field of Aspect-based Emotion Analysis (ABEA) remains underexplored due to dataset limitations and the increased complexity of emotion classification. In this study, we used EmoGRACE, a fine-tuned BERT-based model for ABEA, which we applied to georeferenced tweets of the 2020 California wildfires. The results for this case study reveal distinct spatio-temporal emotion patterns for wildfire-related aspect terms, with fear and sadness increasing near wildfire perimeters. This study demonstrates the feasibility of tracking emotion dynamics across disaster-affected regions and highlights the potential of ABEA in real-time disaster monitoring. The results suggest that ABEA can provide a nuanced understanding of public sentiment during crises for policymakers. Full article

► Show Figures

Figure 1

27 pages, 1817 KB

Open AccessArticle

A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media

by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez

Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025

Cited by 1 | Viewed by 1416

Abstract

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

27 pages, 7617 KB

Open AccessArticle

Emoji-Driven Sentiment Analysis for Social Bot Detection with Relational Graph Convolutional Networks

by Kaqian Zeng, Zhao Li and Xiujuan Wang

Sensors 2025, 25(13), 4179; https://doi.org/10.3390/s25134179 - 4 Jul 2025

Viewed by 707

Abstract

The proliferation of malicious social bots poses severe threats to cybersecurity and social media information ecosystems. Existing detection methods often overlook the semantic value and emotional cues conveyed by emojis in user-generated tweets. To address this gap, we propose ESA-BotRGCN, an emoji-driven multi-modal [...] Read more.

The proliferation of malicious social bots poses severe threats to cybersecurity and social media information ecosystems. Existing detection methods often overlook the semantic value and emotional cues conveyed by emojis in user-generated tweets. To address this gap, we propose ESA-BotRGCN, an emoji-driven multi-modal detection framework that integrates semantic enhancement, sentiment analysis, and multi-dimensional feature modeling. Specifically, we first establish emoji–text mapping relationships using the Emoji Library, leverage GPT-4 to improve textual coherence, and generate tweet embeddings via RoBERTa. Subsequently, seven sentiment-based features are extracted to quantify statistical disparities in emotional expression patterns between bot and human accounts. An attention gating mechanism is further designed to dynamically fuse these sentiment features with user description, tweet content, numerical attributes, and categorical features. Finally, a Relational Graph Convolutional Network (RGCN) is employed to model heterogeneous social topology for robust bot detection. Experimental results on the TwiBot-20 benchmark dataset demonstrate that our method achieves a superior accuracy of 87.46%, significantly outperforming baseline models and validating the effectiveness of emoji-driven semantic and sentiment enhancement strategies. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

20 pages, 1496 KB

Open AccessArticle

Utilizing LLMs and ML Algorithms in Disaster-Related Social Media Content

by Vasileios Linardos, Maria Drakaki and Panagiotis Tzionas

GeoHazards 2025, 6(3), 33; https://doi.org/10.3390/geohazards6030033 - 2 Jul 2025

Viewed by 1064

Abstract

In this research, we explore the use of Large Language Models (LLMs) and clustering techniques to automate the structuring and labeling of disaster-related social media content. With a gathered dataset comprising millions of tweets related to various disasters, our approach aims to transform [...] Read more.

In this research, we explore the use of Large Language Models (LLMs) and clustering techniques to automate the structuring and labeling of disaster-related social media content. With a gathered dataset comprising millions of tweets related to various disasters, our approach aims to transform unstructured and unlabeled data into a structured and labeled format that can be readily used for training machine learning algorithms and enhancing disaster response efforts. We leverage LLMs to preprocess and understand the semantic content of the tweets, applying several semantic properties to the data. Subsequently, we apply clustering techniques to identify emerging themes and patterns that may not be captured by predefined categories, with these patterns surfaced through topic extraction of the clusters. We proceed with manual labeling and evaluation of 10,000 examples to evaluate the LLMs’ ability to understand tweet features. Our methodology is applied to real-world data for disaster events, with results directly applicable to actual crisis situations. Full article

► Show Figures

Figure 1

33 pages, 11250 KB

Open AccessArticle

RADAR#: An Ensemble Approach for Radicalization Detection in Arabic Social Media Using Hybrid Deep Learning and Transformer Models

by Emad M. Al-Shawakfa, Anas M. R. Alsobeh, Sahar Omari and Amani Shatnawi

Information 2025, 16(7), 522; https://doi.org/10.3390/info16070522 - 22 Jun 2025

Cited by 2 | Viewed by 802

Abstract

The recent increase in extremist material on social media platforms makes serious countermeasures to international cybersecurity and national security efforts more difficult. RADAR#, a deep ensemble approach for the detection of radicalization in Arabic tweets, is introduced in this paper. Our model combines [...] Read more.

The recent increase in extremist material on social media platforms makes serious countermeasures to international cybersecurity and national security efforts more difficult. RADAR#, a deep ensemble approach for the detection of radicalization in Arabic tweets, is introduced in this paper. Our model combines a hybrid CNN-Bi-LSTM framework with a top Arabic transformer model (AraBERT) through a weighted ensemble strategy. We employ domain-specific Arabic tweet pre-processing techniques and a custom attention layer to better focus on radicalization indicators. Experiments over a 89,816 Arabic tweet dataset indicate that RADAR# reaches 98% accuracy and a 97% F1-score, surpassing advanced approaches. The ensemble strategy is particularly beneficial in handling dialectical variations and context-sensitive words common in Arabic social media updates. We provide a full performance analysis of the model, including ablation studies and attention visualization for better interpretability. Our contribution is useful to the cybersecurity community through an effective early detection mechanism of online radicalization in Arabic language content, which can be potentially applied in counter-terrorism and online content moderation. Full article

► Show Figures

Figure 1

24 pages, 963 KB

Open AccessArticle

Multihead Average Pseudo-Margin Learning for Disaster Tweet Classification

by Iustin Sîrbu, Robert-Adrian Popovici, Traian Rebedea and Ștefan Trăușan-Matu

Information 2025, 16(6), 434; https://doi.org/10.3390/info16060434 - 24 May 2025

Viewed by 422

Abstract

During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the intervention process [...] Read more.

During natural disasters, social media platforms, such as X (formerly Twitter), become a valuable source of real-time information, with eyewitnesses and affected individuals posting messages about the produced damage and the victims. Although this information can be used to streamline the intervention process of local authorities and to achieve a better distribution of available resources, manually annotating these messages is often infeasible due to time and cost constraints. To address this challenge, we explore the use of semi-supervised learning, a technique that leverages both labeled and unlabeled data, to enhance neural models for disaster tweet classification. Specifically, we investigate state-of-the-art semi-supervised learning models and focus on co-training, a less-explored approach in recent years. Moreover, we propose a novel hybrid co-training architecture, Multihead Average Pseudo-Margin, which obtains state-of-the-art results on several classification tasks. Our approach extends the advantages of the voting mechanism from Multihead Co-Training by using the Average Pseudo-Margin (APM) score to improve the quality of the pseudo-labels and self-adaptive confidence thresholds for improving imbalanced classification. Our method achieves up to 7.98% accuracy improvement in low-data scenarios and 2.84% improvement when using the entire labeled dataset, reaching 89.55% accuracy on the Humanitarian task and 91.23% on the Informative task. These results demonstrate the potential of our approach in addressing the critical need for automated disaster tweet classification. We made our code publicly available for future research. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence with Applications)

► Show Figures

Figure 1

19 pages, 1914 KB

Open AccessArticle

Exploring Multi-Agent Debate for Zero-Shot Stance Detection: A Novel Approach

by Junxia Ma, Changjiang Wang, Lu Rong, Bo Wang and Yaoli Xu

Appl. Sci. 2025, 15(9), 4612; https://doi.org/10.3390/app15094612 - 22 Apr 2025

Viewed by 1643

Abstract

Zero-shot stance detection aims to identify the stance expressed in social media text aimed at specific targets without relying on annotated data. However, due to insufficient contextual information and the inherent ambiguity of language, this task faces numerous challenges in low-resource scenarios. This [...] Read more.

Zero-shot stance detection aims to identify the stance expressed in social media text aimed at specific targets without relying on annotated data. However, due to insufficient contextual information and the inherent ambiguity of language, this task faces numerous challenges in low-resource scenarios. This work proposes a novel zero-shot stance detection method based on multi-agent debate (ZSMD) to address the aforementioned challenges. Specifically, we construct two debater agents representing the supporting and opposing stances. A knowledge enhancement module supplements the original tweet and target with relevant background knowledge, providing richer contextual support for argument generation. Subsequently, the two agents engage in debate over a predetermined number of rounds, employing rebuttal strategies such as factual verification, logical analysis, and sentiment analysis. If no consensus is reached within the specified rounds, a referee agent synthesizes the debate process and original input information to deliver the final stance determination. We evaluate ZSMD on two benchmark datasets, SemEval-2016 Task 6 and P-Stance, and compare it against strong zero-shot baselines such as MB-Cal and COLA. The experimental results show that ZSMD not only achieves higher accuracy than these baselines, but also provides deeper insights into subtle differences in opinion expression, highlighting the potential of structured argumentation in low-resource settings. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 2611 KB

Open AccessArticle

Deep Learning-Based Short Text Summarization: An Integrated BERT and Transformer Encoder–Decoder Approach

by Fahd A. Ghanem, M. C. Padma, Hudhaifa M. Abdulwahab and Ramez Alkhatib

Computation 2025, 13(4), 96; https://doi.org/10.3390/computation13040096 - 12 Apr 2025

Viewed by 2238

Abstract

The field of text summarization has evolved from basic extractive methods that identify key sentences to sophisticated abstractive techniques that generate contextually meaningful summaries. In today’s digital landscape, where an immense volume of textual data is produced every day, the need for concise [...] Read more.

The field of text summarization has evolved from basic extractive methods that identify key sentences to sophisticated abstractive techniques that generate contextually meaningful summaries. In today’s digital landscape, where an immense volume of textual data is produced every day, the need for concise and coherent summaries is more crucial than ever. However, summarizing short texts, particularly from platforms like Twitter, presents unique challenges due to character constraints, informal language, and noise from elements such as hashtags, mentions, and URLs. To overcome these challenges, this paper introduces a deep learning framework for automated short text summarization on Twitter. The proposed approach combines bidirectional encoder representations from transformers (BERT) with a transformer-based encoder–decoder architecture (TEDA), incorporating an attention mechanism to improve contextual understanding. Additionally, long short-term memory (LSTM) networks are integrated within BERT to effectively capture long-range dependencies in tweets and their summaries. This hybrid model ensures that generated summaries remain informative, concise, and contextually relevant while minimizing redundancy. The performance of the proposed framework was assessed using three benchmark Twitter datasets—Hagupit, SHShoot, and Hyderabad Blast—with ROUGE scores serving as the evaluation metric. Experimental results demonstrate that the model surpasses existing approaches in accurately capturing key information from tweets. These findings underscore the framework’s effectiveness in automated short text summarization, offering a robust solution for efficiently processing and summarizing large-scale social media content. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Figure 1

36 pages, 4245 KB

Open AccessArticle

An Unsupervised Integrated Framework for Arabic Aspect-Based Sentiment Analysis and Abstractive Text Summarization of Traffic Services Using Transformer Models

by Alanoud Alotaibi and Farrukh Nadeem

Smart Cities 2025, 8(2), 62; https://doi.org/10.3390/smartcities8020062 - 8 Apr 2025

Cited by 1 | Viewed by 1393

Abstract

Social media is crucial for gathering public feedback on government services, particularly in the traffic sector. While Aspect-Based Sentiment Analysis (ABSA) offers a means to extract actionable insights from user posts, analyzing Arabic content poses unique challenges. Existing Arabic ABSA approaches heavily rely [...] Read more.

Social media is crucial for gathering public feedback on government services, particularly in the traffic sector. While Aspect-Based Sentiment Analysis (ABSA) offers a means to extract actionable insights from user posts, analyzing Arabic content poses unique challenges. Existing Arabic ABSA approaches heavily rely on supervised learning and manual annotation, limiting scalability. To tackle these challenges, we suggest an integrated framework combining unsupervised BERTopic-based Aspect Category Detection with distance supervision using a fine-tuned CAMeLBERT model for sentiment classification. This is further complemented by transformer-based summarization through a fine-tuned AraBART model. Key contributions of this paper include: (1) the first comprehensive Arabic traffic services dataset containing 461,844 tweets, enabling future research in this previously unexplored domain; (2) a novel unsupervised approach for Arabic ABSA that eliminates the need for large-scale manual annotation, using FastText custom embeddings and BERTopic to achieve superior topic clustering; (3) a pioneering integration of aspect detection, sentiment analysis, and abstractive summarization that provides a complete pipeline for analyzing Arabic traffic service feedback; (4) state-of-the-art performance metrics across all tasks, achieving 92% accuracy in ABSA and a ROUGE-L score of 0.79 for summarization, establishing new benchmarks for Arabic NLP in the traffic domain. The framework significantly enhances smart city traffic management by enabling automated processing of citizen feedback, supporting data-driven decision-making, and allowing authorities to monitor public sentiment, identify emerging issues, and allocate resources based on citizen needs, ultimately improving urban mobility and service responsiveness. Full article

► Show Figures

Figure 1

13 pages, 633 KB

Open AccessArticle

Sentiment Matters for Cryptocurrencies: Evidence from Tweets

by Radu Lupu and Paul Cristian Donoiu

Data 2025, 10(4), 50; https://doi.org/10.3390/data10040050 - 1 Apr 2025

Cited by 1 | Viewed by 7757

Abstract

This study provides empirical evidence that cryptocurrency market movements are influenced by sentiment extracted from social media. Using a high frequency dataset covering four major cryptocurrencies (Bitcoin, Ether, Litecoin, and Ripple) from October 2017 to September 2021, we apply state-of-the-art natural language processing [...] Read more.

This study provides empirical evidence that cryptocurrency market movements are influenced by sentiment extracted from social media. Using a high frequency dataset covering four major cryptocurrencies (Bitcoin, Ether, Litecoin, and Ripple) from October 2017 to September 2021, we apply state-of-the-art natural language processing techniques on tweets from influential Twitter accounts. We classify sentiment into positive, negative, and neutral categories and analyze its effects on log returns, liquidity, and price jumps by examining market reactions around tweet occurrences. Our findings show that tweets significantly impact trading volume and liquidity: neutral sentiment tweets enhance liquidity consistently, negative sentiments prompt immediate volatility spikes, and positive sentiments exert a delayed yet lasting influence on the market. This highlights the critical role of social media sentiment in influencing intraday market dynamics and extends the research on sentiment-driven market efficiency. Full article

► Show Figures

Figure 1

33 pages, 3077 KB

Open AccessArticle

Perspective-Based Microblog Summarization

by Chih-Yuan Li, Soon Ae Chun and James Geller

Information 2025, 16(4), 285; https://doi.org/10.3390/info16040285 - 1 Apr 2025

Viewed by 869

Abstract

Social media allows people to express and share a variety of experiences, opinions, beliefs, interpretations, or viewpoints on a single topic. Summarizing a collection of social media posts (microblogs) on one topic may be challenging and can result in an incoherent summary due [...] Read more.

Social media allows people to express and share a variety of experiences, opinions, beliefs, interpretations, or viewpoints on a single topic. Summarizing a collection of social media posts (microblogs) on one topic may be challenging and can result in an incoherent summary due to multiple perspectives from different users. We introduce a novel approach to microblog summarization, the Multiple-View Summarization Framework (MVSF), designed to efficiently generate multiple summaries from the same social media dataset depending on chosen perspectives and deliver personalized and fine-grained summaries. The MVSF leverages component-of-perspective computing, which can recognize the perspectives expressed in microblogs, such as sentiments, political orientations, or unreliable opinions (fake news). The perspective computing can filter social media data to summarize them according to specific user-selected perspectives. For the summarization methods, our framework implements three extractive summarization methods: Entity-based, Social Signal-based, and Triple-based. We conduct comparative evaluations of MVSF summarizations against state-of-the-art summarization models, including BertSum, SBert, T5, and Bart-Large-CNN, by using a gold-standard BBC news dataset and Rouge scores. Furthermore, we utilize a dataset of 18,047 tweets about COVID-19 vaccines to demonstrate the applications of MVSF. Our contributions include the innovative approach of using user perspectives in summarization methods as a unified framework, capable of generating multiple summaries that reflect different perspectives, in contrast to prior approaches of generating one-size-fits-all summaries for one dataset. The practical implication of MVSF is that it offers users diverse perspectives from social media data. Our prototype web application is also implemented using ChatGPT to show the feasibility of our approach. Full article

(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)

► Show Figures

Figure 1

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (230)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI