MDPI - Publisher of Open Access Journals

14 pages, 804 KiB

Open AccessArticle

Using Large Language Models to Infer Problematic Instagram Use from User Engagement Metrics: Agreement Across Models and Validation with Self-Reports

by Davide Marengo and Michele Settanni

Electronics 2025, 14(13), 2548; https://doi.org/10.3390/electronics14132548 - 24 Jun 2025

Viewed by 567

Abstract

This study investigated the feasibility of using large language models (LLMs) to infer problematic Instagram use, which refers to excessive or compulsive engagement with the platform that negatively impacts users’ daily functioning, productivity, or well-being, from a limited set of metrics of user [...] Read more.

This study investigated the feasibility of using large language models (LLMs) to infer problematic Instagram use, which refers to excessive or compulsive engagement with the platform that negatively impacts users’ daily functioning, productivity, or well-being, from a limited set of metrics of user engagement in the platform. Specifically, we explored whether OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro could accurately predict self-reported problematic use tendencies based solely on readily available user engagement metrics like daily time spent on the platform, weekly posts and stories, and follower/following counts. Our sample comprised 775 Italian Instagram users (61.6% female; aged 18–63), who were recruited through a snowball sampling method. Item-level and total scores derived by querying the LLMs’ application programming interfaces were correlated with self-report items and the total score measured via an adapted Bergen Social Media Addiction Scale. LLM-inferred scores showed positive correlations with both item-level and total scores for problematic Instagram use. The strongest correlations were observed for the total scores, with GPT-4o achieving a correlation of r = 0.414 and Gemini 1.5 Pro achieving a correlation of r = 0.319. In cross-validated regression analyses, adding LLM-generated scores, especially from GPT-4o, significantly improved the prediction of problematic Instagram use compared to using usage metrics alone. GPT-4o’s performance in random forest models was comparable to models trained directly on Instagram metrics, demonstrating its ability to capture complex, non-linear relationships indicative of addiction without needing extensive model training. This study provides compelling preliminary evidence for the use of LLMs in inferring problematic Instagram use from limited data points, opening exciting new avenues for research and intervention. Full article

(This article belongs to the Special Issue Application of Data Mining in Social Media)

► Show Figures

Figure 1

18 pages, 4253 KiB

Open AccessArticle

The Emotional Landscape of Technological Innovation: A Data-Driven Case Study of ChatGPT’s Launch

by Lowri Williams and Pete Burnap

Informatics 2025, 12(3), 58; https://doi.org/10.3390/informatics12030058 - 22 Jun 2025

Viewed by 692

Abstract

The rapid development and deployment of artificial intelligence (AI) technologies have sparked intense public interest and debate. While these innovations promise to revolutionise various aspects of human life, it is crucial to understand the complex emotional responses they elicit from potential adopters and [...] Read more.

The rapid development and deployment of artificial intelligence (AI) technologies have sparked intense public interest and debate. While these innovations promise to revolutionise various aspects of human life, it is crucial to understand the complex emotional responses they elicit from potential adopters and users. Such findings can offer crucial guidance for stakeholders involved in the development, implementation, and governance of AI technologies like OpenAI’s ChatGPT, a large language model (LLM) that garnered significant attention upon its release, enabling more informed decision-making regarding potential challenges and opportunities. While previous studies have employed data-driven approaches towards investigating public reactions to emerging technologies, they often relied on sentiment polarity analysis, which categorises responses as positive or negative. However, this binary approach fails to capture the nuanced emotional landscape surrounding technological adoption. This paper overcomes this limitation by presenting a comprehensive analysis for investigating the emotional landscape surrounding technology adoption by using the launch of ChatGPT as a case study. In particular, a large corpus of social media texts containing references to ChatGPT was compiled. Text mining techniques were applied to extract emotions, capturing a more nuanced and multifaceted representation of public reactions. This approach allows the identification of specific emotions such as excitement, fear, surprise, and frustration, providing deeper insights into user acceptance, integration, and potential adoption of the technology. By analysing this emotional landscape, we aim to provide a more comprehensive understanding of the factors influencing ChatGPT’s reception and potential long-term impact. Furthermore, we employ topic modelling to identify and extract the common themes discussed across the dataset. This additional layer of analysis allows us to understand the specific aspects of ChatGPT driving different emotional responses. By linking emotions to particular topics, we gain a more contextual understanding of public reaction, which can inform decision-making processes in the development, deployment, and regulation of AI technologies. Full article

(This article belongs to the Section Big Data Mining and Analytics)

► Show Figures

Figure 1

39 pages, 4748 KiB

Open AccessArticle

Harnessing Multi-Modal Synergy: A Systematic Framework for Disaster Loss Consistency Analysis and Emergency Response

by Siqing Shan, Jingyu Su, Junze Li, Yinong Li and Zhongbao Zhou

Systems 2025, 13(7), 498; https://doi.org/10.3390/systems13070498 - 20 Jun 2025

Viewed by 398

Abstract

When a disaster occurs, a large number of social media posts on platforms like Weibo attract public attention with their combination of text and images. However, the consistency between textual descriptions and visual representations varies significantly. Consistent multi-modal data are crucial for helping [...] Read more.

When a disaster occurs, a large number of social media posts on platforms like Weibo attract public attention with their combination of text and images. However, the consistency between textual descriptions and visual representations varies significantly. Consistent multi-modal data are crucial for helping the public understand the disaster situation and support rescue efforts. This study aims to develop a systematic framework for assessing the consistency of multi-modal disaster-related data on social media. This study explored how the congruence between text and image content affects public engagement and informs strategies for efficient emergency responses. Firstly, the Clip (Contrastive Language-Image Pre-Training) model was used to mine the disaster correlation, loss category, and severity of the images and text. Then, the consistency of image–text pairs was qualitatively analyzed and quantitatively calculated. Finally, the influence of graphic consistency on social concern was discussed. The experimental findings reveal that the consistency of text and image data significantly influences the degree of public concern. When the consistency increases by 1%, the social attention index will increase by about 0.8%. This shows that consistency is a key factor for attracting public attention and promoting the dissemination of information related to important disasters. The proposed framework offers a robust, systematic approach to analyzing disaster loss information consistency. It allows for the efficient extraction of high-consistency data from vast social media data sets, providing governments and emergency response agencies with timely, accurate insights into disaster situations. Full article

(This article belongs to the Section Systems Practice in Social Science)

► Show Figures

Figure 1

19 pages, 2065 KiB

Open AccessArticle

Do Spatial Trajectories of Social Media Users Imply the Credibility of the Users’ Tweets During Earthquake Crisis Management?

by Ayse Giz Gulnerman

Appl. Sci. 2025, 15(12), 6897; https://doi.org/10.3390/app15126897 - 18 Jun 2025

Viewed by 476

Abstract

Earthquakes are sudden-onset disasters requiring rapid, accurate information for effective crisis response. Social media (SM) platforms provide abundant geospatial data but are often unstructured and produced by diverse users, posing challenges in filtering relevant content. Traditional content filtering methods rely on natural language [...] Read more.

Earthquakes are sudden-onset disasters requiring rapid, accurate information for effective crisis response. Social media (SM) platforms provide abundant geospatial data but are often unstructured and produced by diverse users, posing challenges in filtering relevant content. Traditional content filtering methods rely on natural language processing (NLP), which underperforms with mixed-language posts or less widely spoken languages. Moreover, these approaches often neglect the spatial proximity of users to the event, a crucial factor in determining relevance during disasters. This study proposes an NLP-free model that assesses the spatial credibility of SM content by analysing users’ spatial trajectories. Using earthquake-related tweets, we developed a machine learning-based classification model that categorises posts as directly relevant, indirectly relevant, or irrelevant. The Random Forest model achieved the highest overall classification accuracy of 89%, while the k-NN model performed best for detecting directly relevant content, with an accuracy of 63%. Although promising overall, the classification accuracy for the directly relevant category indicates room for improvement. Our findings highlight the value of spatial analysis in enhancing the reliability of SM data (SMD) during crisis events. By bypassing textual analysis, this framework supports relevance classification based solely on geospatial behaviour, offering a novel method for evaluating content trustworthiness. This spatial approach can complement existing crisis informatics tools and be extended to other disaster types and event-based applications. Full article

(This article belongs to the Section Earth Sciences)

► Show Figures

Figure 1

24 pages, 2410 KiB

Open AccessArticle

UA-HSD-2025: Multi-Lingual Hate Speech Detection from Tweets Using Pre-Trained Transformers

by Muhammad Ahmad, Muhammad Waqas, Ameer Hamza, Sardar Usman, Ildar Batyrshin and Grigori Sidorov

Computers 2025, 14(6), 239; https://doi.org/10.3390/computers14060239 - 18 Jun 2025

Cited by 1 | Viewed by 696

Abstract

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the [...] Read more.

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the underexplored multilingual challenges of Arabic and Urdu hate speech through a comprehensive approach. To achieve this objective, this study makes four different key contributions. First, we have created a unique multi-lingual, manually annotated binary and multi-class dataset (UA-HSD-2025) sourced from X, which contains the five most important multi-class categories of hate speech. Secondly, we created detailed annotation guidelines to make a robust and perfect hate speech dataset. Third, we explore two strategies to address the challenges of multilingual data: a joint multilingual and translation-based approach. The translation-based approach involves converting all input text into a single target language before applying a classifier. In contrast, the joint multilingual approach employs a unified model trained to handle multiple languages simultaneously, enabling it to classify text across different languages without translation. Finally, we have employed state-of-the-art 54 different experiments using different machine learning using TF-IDF, deep learning using advanced pre-trained word embeddings such as FastText and Glove, and pre-trained language-based models using advanced contextual embeddings. Based on the analysis of the results, our language-based model (XLM-R) outperformed traditional supervised learning approaches, achieving 0.99 accuracy in binary classification for Arabic, Urdu, and joint-multilingual datasets, and 0.95, 0.94, and 0.94 accuracy in multi-class classification for joint-multilingual, Arabic, and Urdu datasets, respectively. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

32 pages, 16621 KiB

Open AccessFeature PaperArticle

Social Intelligence Mining: Transforming Land Management with Data and Deep Learning

by Mohammad Reza Yeganegi, Hossein Hassani and Nadejda Komendantova

Land 2025, 14(6), 1198; https://doi.org/10.3390/land14061198 - 3 Jun 2025

Viewed by 484

Abstract

The integration of social intelligence mining with Large Language Models (LLMs) and unstructured social data can enhance land management by incorporating human behavior, social trends, and collective decision-making. This study investigates the role of social intelligence—derived from social media—in enhancing land use, urban [...] Read more.

The integration of social intelligence mining with Large Language Models (LLMs) and unstructured social data can enhance land management by incorporating human behavior, social trends, and collective decision-making. This study investigates the role of social intelligence—derived from social media—in enhancing land use, urban planning, and environmental policy crafting. To map the structure of public concerns, a new algorithm is proposed based on contextual analysis and LLMs. The proposed method, along with public discussion analysis, is applied to posts on the X-platform (formerly Twitter) to extract public perception on issues related to land use, urban planning, and environmental policies. Results show that the proposed method can effectively extract public concerns and different perspectives of public discussion. This case study illustrates how social intelligence mining can be employed to support policymakers when used with caution. The cautionary conditions in the use of these methods are discussed in more detail. Full article

(This article belongs to the Section Land Innovations – Data and Machine Learning)

► Show Figures

Figure 1

20 pages, 1750 KiB

Open AccessEditor’s ChoiceArticle

Enhancing Recommendation Systems with Real-Time Adaptive Learning and Multi-Domain Knowledge Graphs

by Zeinab Shahbazi, Rezvan Jalali and Zahra Shahbazi

Big Data Cogn. Comput. 2025, 9(5), 124; https://doi.org/10.3390/bdcc9050124 - 8 May 2025

Cited by 1 | Viewed by 1111

Abstract

In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt [...] Read more.

In the era of information explosion, recommendation systems play a crucial role in filtering vast amounts of content for users. Traditional recommendation models leverage knowledge graphs, sentiment analysis, social capital, and generative AI to enhance personalization. However, existing models still struggle to adapt dynamically to users’ evolving interests across multiple content domains in real-time. To address this gap, the cross-domain adaptive recommendation system (CDARS) is proposed, which integrates real-time behavioral tracking with multi-domain knowledge graphs to refine user preference modeling continuously. Unlike conventional methods that rely on static or historical data, CDARS dynamically adjusts its recommendation strategies based on contextual factors such as real-time engagement, sentiment fluctuations, and implicit preference drifts. Furthermore, a novel explainable adaptive learning (EAL) module was introduced, providing transparent insights into recommendations’ evolving nature, thereby improving user trust and system interpretability. To enable such real-time adaptability, CDARS incorporates multimodal sentiment analysis of user-generated content, behavioral pattern mining (e.g., click timing, revisit frequency), and learning trajectory modeling through time-aware embeddings and incremental updates of user representations. These dynamic signals are mapped into evolving knowledge graphs, forming continuously updated learning charts that drive more context-aware and emotionally intelligent recommendations. Our experimental results on datasets spanning social media, e-commerce, and entertainment domains demonstrate that CDARS significantly enhances recommendation relevance, achieving an average improvement of 7.8% in click-through rate (CTR) and 8.3% in user engagement compared to state-of-the-art models. This research presents a paradigm shift toward truly dynamic and explainable recommendation systems, creating a way for more personalized and user-centric experiences in the digital landscape. Full article

(This article belongs to the Special Issue Knowledge Graphs in the Big Data Era: Navigating the Confluence of Distribution, Visualization, and Advanced Computational Models)

► Show Figures

Figure 1

26 pages, 15214 KiB

Open AccessArticle

Exploring the Mental Health Benefits of Urban Green Spaces Through Social Media Big Data: A Case Study of the Changsha–Zhuzhou–Xiangtan Urban Agglomeration

by Zhijian Li and Tian Dong

Sustainability 2025, 17(8), 3465; https://doi.org/10.3390/su17083465 - 13 Apr 2025

Viewed by 981

Abstract

Urban green spaces (UGSs) provide recreational and cultural services to urban residents and play an important role in mental health. This study uses big data mining techniques to analyze 62 urban parks in the Changsha–Zhuzhou–Xiangtan urban agglomeration (CZXUA) based on data such as [...] Read more.

Urban green spaces (UGSs) provide recreational and cultural services to urban residents and play an important role in mental health. This study uses big data mining techniques to analyze 62 urban parks in the Changsha–Zhuzhou–Xiangtan urban agglomeration (CZXUA) based on data such as points of interest (POIs), areas of interest (AOIs), and user comments from the popular social media platform Dianping. In addition, the authors apply sentiment analysis using perceptual dictionaries combined with geographic information data to identify text emotions. A structural equation model (SEM) was constructed in IBM SPSS AMOS 24.0 software to investigate the relationship between five external features, five types of cultural services, nine landscape elements, four environmental factors, and tourist emotions. The results show that UGS external features, cultural services, landscape elements, and environmental factors all have positive effects on residents’ emotions, with landscape elements having the greatest impact. The other factors show similar effects on residents’ moods. In various UGSs, natural elements such as vegetation and water tend to evoke positive emotions in residents, while artificial elements such as roads, squares, and buildings elicit more varied emotional responses. This research provides science-based support for the design and management of urban parks. Full article

(This article belongs to the Topic Sustainable Built Environment, 2nd Volume)

► Show Figures

Graphical abstract

22 pages, 3983 KiB

Open AccessArticle

Transforming Education in the AI Era: A Technology–Organization–Environment Framework Inquiry into Public Discourse

by Jinqiao Zhou and Hongfeng Zhang

Appl. Sci. 2025, 15(7), 3886; https://doi.org/10.3390/app15073886 - 2 Apr 2025

Cited by 1 | Viewed by 1668

Abstract

The advent of generative artificial intelligence (GAI) technologies has significantly influenced the educational landscape. However, public perceptions and the underlying emotions toward artificial intelligence-generated content (AIGC) applications in education remain complex issues. To address this issue, this study employs LDA network public opinion [...] Read more.

The advent of generative artificial intelligence (GAI) technologies has significantly influenced the educational landscape. However, public perceptions and the underlying emotions toward artificial intelligence-generated content (AIGC) applications in education remain complex issues. To address this issue, this study employs LDA network public opinion topic mining and SnowNLP sentiment analysis to comprehensively analyze over 40,000 comments collected from multiple social media platforms in China. Through a detailed analysis of the data, this study examines the distribution of positive and negative emotions and identifies six topics. The study further utilizes visual tools such as word clouds and heatmaps to present the research findings. The results indicate that the emotional polarity across all topics is characterized by a predominance of positive emotions over negative ones. Moreover, an analysis of the keywords across the six topics reveals that each has its own emphasis, yet there are overlaps between them. Therefore, this study, through quantitative methods, also reflects the complex interconnections among the elements within the educational ecosystem. Additionally, this study integrates the six identified topics with the Technology–Organization–Environment (TOE) framework to explore the broad impact of AIGC on education from the perspectives of technology, organization, and environment. This research provides a novel perspective on the emotional attitudes and key concerns of the Chinese public regarding the use of AIGC in education. Full article

(This article belongs to the Special Issue Social Media Meets AI and Data Science)

► Show Figures

Figure 1

18 pages, 13221 KiB

Open AccessArticle

Affective-Computing-Driven Personalized Display of Cultural Information for Commercial Heritage Architecture

by Huimin Hu, Yaxin Wan, Khang Yeu Tang, Qingyue Li and Xiaohui Wang

Appl. Sci. 2025, 15(7), 3459; https://doi.org/10.3390/app15073459 - 21 Mar 2025

Viewed by 759

Abstract

The display methods for traditional cultural heritage lack personalization and emotional interaction, making it difficult to stimulate the public’s deep cultural awareness. This is especially true in commercialized historical districts, where cultural value is easily overlooked. Balancing cultural value and commercial value in [...] Read more.

The display methods for traditional cultural heritage lack personalization and emotional interaction, making it difficult to stimulate the public’s deep cultural awareness. This is especially true in commercialized historical districts, where cultural value is easily overlooked. Balancing cultural value and commercial value in information display has become one of the challenges that needs to be addressed. To solve the above problems, this article focuses on the identification of deep cultural values and the optimization of the information display in Beijing’s Qianmen Street, proposing a framework for cultural information mining and display based on affective computing and large language models. The pre-trained models QwenLM and RoBERTa were employed to analyze text and image data from user-generated content on social media, identifying users’ emotional tendencies toward various cultural value dimensions and quantifying their multilayered understanding of architectural heritage. This study further constructed a multimodal information presentation model driven by emotional feedback, mapping it into virtual reality environments to enable personalized, multilayered cultural information visualization. The framework’s effectiveness was validated through an eye-tracking experiment that assessed how different presentation styles impacted users’ emotional engagement and cognitive outcomes. The results show that the affective computing and multimodal data fusion approach to cultural heritage presentation accurately captures users’ emotions, enhancing their interest and emotional involvement. Personalized presentations of information significantly improve users’ engagement, historical understanding, and cultural experience, thereby fostering a deeper comprehension of historical contexts and architectural details. Full article

(This article belongs to the Special Issue Application of Affective Computing)

► Show Figures

Figure 1

25 pages, 653 KiB

Open AccessReview

Algorithms Facilitating the Observation of Urban Residential Vacancy Rates: Technologies, Challenges and Breakthroughs

by Binglin Liu, Weijia Zeng, Weijiang Liu, Yi Peng and Nini Yao

Algorithms 2025, 18(3), 174; https://doi.org/10.3390/a18030174 - 20 Mar 2025

Viewed by 798

Abstract

In view of the challenges brought by a complex environment, diverse data sources and urban development needs, our study comprehensively reviews the application of algorithms in urban residential vacancy rate observation. First, we explore the definition and measurement of urban residential vacancy rate, [...] Read more.

In view of the challenges brought by a complex environment, diverse data sources and urban development needs, our study comprehensively reviews the application of algorithms in urban residential vacancy rate observation. First, we explore the definition and measurement of urban residential vacancy rate, pointing out the difficulties in accurately defining vacant houses and obtaining reliable data. Then, we introduce various algorithms such as traditional statistical learning, machine learning, deep learning and ensemble learning, and analyze their applications in vacancy rate observation. The traditional statistical learning algorithm builds a prediction model based on historical data mining and analysis, which has certain advantages in dealing with linear problems and regular data. However, facing the high nonlinear relationships and complexity of the data in the urban residential vacancy rate observation, its prediction accuracy is difficult to meet the actual needs. With their powerful nonlinear modeling ability, machine learning algorithms have significant advantages in capturing the nonlinear relationships of data. However, they require high data quality and are prone to overfitting phenomenon. Deep learning algorithms can automatically learn feature representation, perform well in processing large amounts of high-dimensional and complex data, and can effectively deal with the challenges brought by various data sources, but the training process is complex and the computational cost is high. The ensemble learning algorithm combines multiple prediction models to improve the prediction accuracy and stability. By comparing these algorithms, we can clarify the advantages and adaptability of different algorithms in different scenarios. Facing the complex environment, the data in the observation of urban residential vacancy rate are affected by many factors. The unbalanced urban development leads to significant differences in residential vacancy rates in different areas. Spatiotemporal heterogeneity means that vacancy rates vary in different geographical locations and over time. The complexity of data affected by various factors means that the vacancy rate is jointly affected by macroeconomic factors, policy regulatory factors, market supply and demand factors and individual resident factors. These factors are intertwined, increasing the complexity of data and the difficulty of analysis. In view of the diversity of data sources, we discuss multi-source data fusion technology, which aims to integrate different data sources to improve the accuracy of vacancy rate observation. The diversity of data sources, including geographic information system (GIS) (Geographic Information System) data, remote sensing images, statistics data, social media data and urban grid management data, requires integration in format, scale, precision and spatiotemporal resolution through data preprocessing, standardization and normalization. The multi-source data fusion algorithm should not only have the ability of intelligent feature extraction and related analysis, but also deal with the uncertainty and redundancy of data to adapt to the dynamic needs of urban development. We also elaborate on the optimization methods of algorithms for different data sources. Through this study, we find that algorithms play a vital role in improving the accuracy of vacancy rate observation and enhancing the understanding of urban housing conditions. Algorithms can handle complex spatial data, integrate diverse data sources, and explore the social and economic factors behind vacancy rates. In the future, we will continue to deepen the application of algorithms in data processing, model building and decision support, and strive to provide smarter and more accurate solutions for urban housing management and sustainable development. Full article

(This article belongs to the Special Issue Algorithms for Smart Cities (2nd Edition))

► Show Figures

Figure 1

22 pages, 2706 KiB

Open AccessArticle

Innovative Mining of User Requirements Through Combined Topic Modeling and Sentiment Analysis: An Automotive Case Study

by Yujia Liu, Dong Zhang, Qian Wan and Zhongzhen Lin

Sensors 2025, 25(6), 1731; https://doi.org/10.3390/s25061731 - 11 Mar 2025

Viewed by 1016

Abstract

As the automotive industry advances rapidly, user needs are in a constant state of evolution. Driven by advancements in big data, artificial intelligence, and natural language processing, mining user requirements from user-generated content (UGC) on social media has become an effective way to [...] Read more.

As the automotive industry advances rapidly, user needs are in a constant state of evolution. Driven by advancements in big data, artificial intelligence, and natural language processing, mining user requirements from user-generated content (UGC) on social media has become an effective way to understand these dynamic needs. While existing technologies have progressed in topic identification and sentiment analysis, single-method approaches often face limitations. This study proposes a novel method for user requirement mining based on BERTopic and RoBERTa, combining the strengths of topic modeling and sentiment analysis to provide a more comprehensive analysis of user needs. To validate this approach, UGC data from four major Chinese media platforms were collected. BERTopic was applied for topic extraction and RoBERTa for sentiment analysis, facilitating a linked analysis of user emotions and identified topics. The findings categorize user requirements into four main areas—performance, comfort and experience, price sensitivity, and safety—while also reflecting the increasing relevance of advanced features, such as sensors, powertrain performance, and other technologies. This method enhances user requirement identification by integrating sentiment analysis with topic modeling, offering actionable insights for automotive manufacturers in product optimization and marketing strategies and presenting a scalable approach adaptable across various industries. Full article

(This article belongs to the Special Issue Cooperative Perception and Control for Autonomous Vehicles)

► Show Figures

Figure 1

30 pages, 1605 KiB

Open AccessArticle

From Misinformation to Insight: Machine Learning Strategies for Fake News Detection

by Despoina Mouratidis, Andreas Kanavos and Katia Kermanidis

Information 2025, 16(3), 189; https://doi.org/10.3390/info16030189 - 28 Feb 2025

Cited by 1 | Viewed by 5892

Abstract

In the digital age, the rapid proliferation of misinformation and disinformation poses a critical challenge to societal trust and the integrity of public discourse. This study presents a comprehensive machine learning framework for fake news detection, integrating advanced natural language processing techniques and [...] Read more.

In the digital age, the rapid proliferation of misinformation and disinformation poses a critical challenge to societal trust and the integrity of public discourse. This study presents a comprehensive machine learning framework for fake news detection, integrating advanced natural language processing techniques and deep learning architectures. We rigorously evaluate a diverse set of detection models across multiple content types, including social media posts, news articles, and user-generated comments. Our approach systematically compares traditional machine learning classifiers (Naïve Bayes, SVMs, Random Forest) with state-of-the-art deep learning models, such as CNNs, LSTMs, and BERT, while incorporating optimized vectorization techniques, including TF-IDF, Word2Vec, and contextual embeddings. Through extensive experimentation across multiple datasets, our results demonstrate that BERT-based models consistently achieve superior performance, significantly improving detection accuracy in complex misinformation scenarios. Furthermore, we extend the evaluation beyond conventional accuracy metrics by incorporating the Matthews Correlation Coefficient (MCC) and Receiver Operating Characteristic–Area Under the Curve (ROC–AUC), ensuring a robust and interpretable assessment of model efficacy. Beyond technical advancements, we explore the ethical implications of automated misinformation detection, addressing concerns related to censorship, algorithmic bias, and the trade-off between content moderation and freedom of expression. This research not only advances the methodological landscape of fake news detection but also contributes to the broader discourse on safeguarding democratic values, media integrity, and responsible AI deployment in digital environments. Full article

(This article belongs to the Special Issue Information Extraction and Language Discourse Processing)

► Show Figures

Graphical abstract

22 pages, 8907 KiB

Open AccessArticle

A Data-Synthesis-Driven Approach to Recognize Urban Functional Zones by Integrating Dynamic Semantic Features

by Xingyu Liu, Yehua Sheng and Lei Yu

Land 2025, 14(3), 489; https://doi.org/10.3390/land14030489 - 26 Feb 2025

Viewed by 450

Abstract

Urban functional zones (UFZs) are related to people’s daily activities. Accurate recognition of UFZs is of great significance for an in-depth understanding of the complex urban system and optimizing the urban spatial structure. Emerging geospatial big data provide new ideas for humans to [...] Read more.

Urban functional zones (UFZs) are related to people’s daily activities. Accurate recognition of UFZs is of great significance for an in-depth understanding of the complex urban system and optimizing the urban spatial structure. Emerging geospatial big data provide new ideas for humans to recognize urban functional zones. Point-of-interest (POI) data have achieved good results in the recognition of UFZs. However, since humans are the actual users of urban functions, and POI data only reflect static socioeconomic characteristics without considering the semantic and temporal features of dynamic human activities, it leads to an incomplete and insufficient representation of complex UFZs. To solve these problems, we proposed a data-synthesis-driven approach to quantify and analyze the distribution and mixing of urban functional zones. Firstly, representation learning is used to mine the spatial semantic features, activity temporal features, and activity semantic features that are embedded in POI data and social media check-in data from spatial, temporal, and semantic aspects. Secondly, a weighted Stacking ensemble model is used to fully integrate the advantages between different features and classifiers to infer the proportions of urban functions and dominant functions of each urban functional zone. A case study within the 5th Ring Road of Beijing, China, is used to evaluate the proposed method. The results show that the approach combining dynamic and static features of POI data and social media data effectively represents the semantic information of UFZs, thereby further improving the accuracy of UFZ recognition. This work can provide a reference for uncovering the hidden linkages between human activity characteristics and urban functions. Full article

► Show Figures

Figure 1

21 pages, 6704 KiB

Open AccessArticle

A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms

by Haihan Liao, Chengmin Wang, Yanzhang Gu and Renhuai Liu

Systems 2025, 13(3), 159; https://doi.org/10.3390/systems13030159 - 26 Feb 2025

Cited by 1 | Viewed by 934

Abstract

Digital transformation (DT) has become an important engine for the development of the digital economy and an important means of reshaping corporate culture, business processes, management models, and so on. Different social communities at different levels have different needs and understandings of digital [...] Read more.

Digital transformation (DT) has become an important engine for the development of the digital economy and an important means of reshaping corporate culture, business processes, management models, and so on. Different social communities at different levels have different needs and understandings of digital transformation. Therefore, this paper proposes to explore the communication themes of digital transformation on social media. This study’s main objective is to uncover underlying thematic structures and core ideas from large amounts of textual data in different social media communities to better understand the significance of the communication themes. This paper also aims to reveal the characteristics of diffusion patterns of DT themes by opinion-themed mining. This study uses text mining and social network analysis methods to mine DT themes, theme structure, and the statistical characteristics of hot words across various online communities. The main findings of this study are as follows. The Huawei forum discusses the technological drivers of the digital economy from a micro level. Sohu News explores business operation strategies at a macro level. The Zhihu forum discusses the elements of digital development at the micro level. Moreover, the hot words’ degree centrality and betweenness centrality across various online communities exhibited a power law distribution. In conclusion, this research paper studies and analyzes DT themes of different social media platforms to discover the opinions and attitudes of various social groups in the digital transformation era and deeply interprets social trends and public opinions in order to provide valuable decision-making theoretical support for managers, enterprises, and governments. Full article

► Show Figures

Figure 1

Search Results (249)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (249)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI