Advances in Data and Network Sciences Applied to Computational Social Science

Ferreira, Leonardo N.

doi:10.3390/info16121032

Open AccessEditorial

Advances in Data and Network Sciences Applied to Computational Social Science

by

Leonardo N. Ferreira

Institute of Computing, University of Campinas, Av. Albert Einstein, 1251, Cidade Universitária, Campinas 13083-889, SP, Brazil

Information 2025, 16(12), 1032; https://doi.org/10.3390/info16121032

Submission received: 14 November 2025 / Accepted: 20 November 2025 / Published: 26 November 2025

(This article belongs to the Special Issue Advances in Data and Network Sciences Applied to Computational Social Science)

Download Versions Notes

Data Science, a cornerstone of modern research, focuses on extracting knowledge and insights from vast and often unstructured datasets [1,2,3]. Network Science offers a robust theoretical and methodological framework for understanding the complex connections within large-scale systems. Modeling these systems as nodes and edges allows for the analysis of relationships, flows, and structural influence [4,5,6]. The combination of these fields provides powerful new capabilities to analyze, model, and predict complex phenomena. This integration is especially important for Computational Social Science, a field that uses computational power and massive data streams to study social phenomena on an unprecedented scale, moving far beyond the limitations of traditional survey methods [7,8,9]. Using tools from both data and network sciences, researchers can now investigate large-scale patterns in human behavior—such as information diffusion, community formation, and opinion dynamics—with a detail and scope that were previously unattainable.

However, despite these advances, significant methodological gaps and open questions remain [10]. A large portion of the current research still relies on static analyses of social phenomena, while the inherently dynamic nature of human interaction demands a much greater focus on temporal and dynamical models. Furthermore, while complex computational models achieve high predictive accuracy, they often function as “black boxes”, creating a central challenge in moving beyond correlation to establish robust causal analysis [11]. Foundational open questions regarding data bias, algorithmic fairness, and ethical concerns also require persistent examination [12]. Finally, while the recent emergence of large language models (LLMs) offers transformative potential, scientists have only begun to explore their capabilities and limitations within the specific theoretical and methodological contexts of computational social science [13].

Addressing these complex issues and advancing the frontiers of this growing field requires a coordinated, interdisciplinary effort that bridges computational rigor with deep social scientific theory. This Special Issue, titled “Advances in Data and Network Sciences Applied to Computational Social Science” presents cutting-edge research that directly addresses these open questions. The collection is designed to highlight the progress at the intersection of these disciplines by presenting novel methodologies and innovative applications to pressing social problems. The articles presented here offer new perspectives that integrate computational findings with established social theories. The contributions in this issue collectively demonstrate the deep value of these integrated computational approaches for understanding fundamental questions about human society, collective behavior, and the complex dynamics of modern social systems.

This paper synthesizes and examines the following papers published in this Special Issue (ordered by date of publication):

Abid, G.; Contreras, F. Mapping Thriving at Work as a Growing Concept: Review and Directions for Future Studies. Information 2022, 13, 383. https://doi.org/10.3390/info13080383
Robitzsch, A. Regularized Generalized Logistic Item Response Model. Information 2023, 14, 306. https://doi.org/10.3390/info14060306
Park, H.; Ureta, I.; Kim, B. Trend Analysis of Decentralized Autonomous Organization Using Big Data Analytics. Information 2023, 14, 326. https://doi.org/10.3390/info14060326
De Santis, E.; Martino, A.; Ronci, F.; Rizzi, A. An Unsupervised Graph-Based Approach for Detecting Relevant Topics: A Case Study on the Italian Twitter Cohort during the Russia–Ukraine Conflict. Information 2023, 14, 330. https://doi.org/10.3390/info14060330
Shah, H.; Jaidka, K.; Ungar, L.; Fagan, J.; Grosser, T. Building a Multimodal Classifier of Email Behavior: Towards a Social Network Understanding of Organizational Communication. Information 2023, 14, 661. https://doi.org/10.3390/info14120661
Domingo-Espiñeira, J.; Fraile-Martínez, O.; Garcia-Montero, C.; Montero, M.; Varaona, A.; Lara-Abelenda, F.J.; Ortega, M.A.; Alvarez-Mon, M.; Alvarez-Mon, M.A. Navigating the Digital Neurolandscape: Analyzing the Social Perception of and Sentiments Regarding Neurological Disorders through Topic Modeling and Unsupervised Research Using Twitter. Information 2024, 15, 152. https://doi.org/10.3390/info15030152
Kusumaningrum, R.; Khoerunnisa, S.F.; Khadijah, K.; Syafrudin, M. Exploring Community Awareness of Mangrove Ecosystem Preservation through Sentence-BERT and K-Means Clustering. Information 2024, 15, 165. https://doi.org/10.3390/info15030165
Nawawi, I.; Ilmawan, K.F.; Maarif, M.R.; Syafrudin, M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information 2024, 15, 499. https://doi.org/10.3390/info15080499
Xue, H. Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks. Information 2024, 15, 597. https://doi.org/10.3390/info15100597
Tsoukanara, E.; Koloniari, G.; Pitoura, E. TempoGRAPHer: Aggregation-Based Temporal Graph Exploration. Information 2025, 16, 46. https://doi.org/10.3390/info16010046

This collection of studies provides a comprehensive overview of the advanced methodological frontiers in information analysis. Table 1 presents an overview of the contributions. A clear trend emerges: the move from static, isolated, and text-only models to frameworks that are dynamic, multimodal, context-aware, and deeply integrated with network science. The research ranges from applying topic modeling and sentiment analysis to vast social media datasets to understand public awareness and perception (papers 3, 6 and 7) to developing advanced systems for tracking the dynamic emergence of topics in real-time (paper 4). This temporal focus is also reproduced in the development of systems for exploring the evolution of graph structures directly (paper 10). Furthermore, the studies demonstrate a move toward integration, where multimodal models (paper 5) combine linguistic data with social network features to achieve superior predictive power. This network-centric view is also seen in simulation models that analyze the coevolution of authorship and citation networks to explain the behavior of scientific metrics (paper 9). In parallel, NLP techniques are becoming more granular, moving from document-level sentiment to aspect-based analysis (paper 8), while bibliometric methods map the conceptual networks of science itself (paper 1). Finally, foundational statistical models are being improved to better capture the complexities of real-world data, such as asymmetry (paper 2). Together, these works exemplify the importance of a research field that is rapidly evolving to meet the challenge of understanding our complex, interconnected, and data-rich world.

One of the most prominent areas for this methodological innovation is the analysis of large-scale social media data. Researchers are increasingly leveraging platforms like Twitter and Reddit as vast, real-time repositories of public opinion and behavior. Paper 3, for instance, performs a trend analysis of Decentralized Autonomous Organizations (DAOs) using big data analytics. Recognizing DAOs as a rapidly emerging model for decentralized governance, the study applies text mining and Latent Dirichlet Allocation (LDA) topic modeling to a massive dataset of tweets containing the “#DAO” hashtag and Reddit posts mentioning “DAO”. This approach allows for an objective, data-driven content analysis, revealing the key themes dominating the public conversation. The findings identify dominant topics related to finance, gaming, and fundraising, and note the common appearance of the term “community” highlighting the social and collaborative dimensions of this new organizational form. This work exemplifies the power of topic modeling to extract coherent themes from millions of user-generated posts, providing a clear map of a new technological and social phenomenon.

While traditional LDA remains a robust tool for topic identification, its reliance on bag-of-words models can sometimes fail to capture the deeper semantic nuances of language. Addressing this limitation, paper 7, proposes a novel technique for exploring community awareness, applied to the context of mangrove ecosystem preservation. This research explicitly positions itself as an alternative to LDA, which can be resource-intensive and struggle with slight semantic differences. The proposed method instead employs a combination of Sentence-BERT (SBERT) for generating semantically meaningful sentence embeddings and K-Means clustering for topic identification. By analyzing Indonesian-language Twitter data related to mangroves, this SBERT-based approach successfully identifies nine distinct topics and visualizes tweet frequency to reveal a growing public awareness and collaborative efforts between government and society. This study represents a methodological refinement, presenting how modern transformer-based embeddings can provide a more semantically precise foundation for topic clustering than traditional probabilistic models.

Social media analysis was also applied in the public health domain, as demonstrated in a comprehensive study in paper 6 on the social perception of neurological disorders. This work analyzes an extensive collection of Twitter data spanning from 2007 to 2023 to navigate the “digital neurolandscape.” Using a combination of topic modeling and unsupervised sentiment analysis, the research maps the public discourse surrounding conditions such as dementia, epilepsy, multiple sclerosis, and Parkinson’s disease. The results reveal that dementias are by far the most discussed neurological disorders. The analysis uncovers key themes, including the profound impact of these diseases on patients and relatives, public appeals for increased awareness and research funding, and discussions of potential treatments. Furthermore, the sentiment analysis associates these topics with dominant emotions of fear, anger, and sadness, balanced by positive emotions like joy, likely tied to research breakthroughs or support networks. This study transforms the Twitter platform into a large-scale sociological sensor, capturing public sentiment and concern regarding a critical global health challenge.

The previous studies mainly focus on a snapshot of topics and sentiments (static) rather than capturing their evolution in real-time. For rapidly unfolding events, such as geopolitical conflicts, a dynamic approach is essential. A study presented in paper 4 addresses this exact challenge by presenting an unsupervised, graph-based approach for detecting relevant topics in the Italian Twitter cohort during the 2022 Russia–Ukraine conflict. The core innovation is a sophisticated topic tracking system framed within a “biological metaphor.” In this model, words and topics possess “vitality” and receive “nourishment” from social interactions, subject to content aging. This unsupervised system moves beyond static LDA to identify emerging topics by monitoring term energy and co-occurrence. By applying this graph-based model, the study highlights the main events and their connections as they emerged, demonstrating a promising new method for real-time monitoring than can capture the birth, growth, and decay of narratives during a crisis.

The focus on dynamic, temporal evolution is not limited to text but extends to the analysis of network structures themselves. Paper 10 introduces TempoGRAPHer, a system explicitly designed for the aggregation-based exploration of temporal attributed graphs. This work addresses the challenge of understanding how complex networks, such as social or cooperation networks, evolve. TempoGRAPHer supports both temporal and attribute-based aggregation, allowing researchers to view network evolution at different granularities. The system features two complementary exploration strategies. First, a skyline-based approach identifies overall trends and dominant patterns. Second, an interaction-based approach allows for closer examination of specific periods of change. This framework provides a robust vocabulary and computational toolkit for identifying and analyzing periods of growth, shrinkage, or stability within complex evolving networks, moving the analysis from a series of static snapshots to a continuous temporal landscape.

Where the TempoGRAPHer framework provides a method for exploring graph evolution, other research investigates the predictive power of network structures when combined with other data modalities. Paper 5 tackles the complex problem of modeling organizational communication by building a multimodal classifier of email behavior. This work introduces the Email MultiModal Architecture (EMMA), designed to predict the probability of receiving an email response. The key methodological leap is the integration of social network features with traditional linguistic analysis. The model does not just analyze the content of an email (using RoBERTa embeddings) but also incorporates data reflecting the sender’s context within the organization, including their social network influence (e.g., centrality) and personal likability. EMMA demonstrates an improved prediction accuracy of up to 12.5% compared to leading text-centric models. This powerfully illustrates that in organizational settings, communication outcomes are not solely dependent on what is said, but critically, on who is saying it and their position within the social fabric.

The critical role of network structure and its coevolution is further exemplified in the field of scientometrics. Paper 9 analyzes the effects of the coevolution of coauthorship and citation networks on scientific impact indicators. Using a preferential attachment mechanism, this research develops a model that integrates these two evolving networks and validates it against a large dataset from the American Physical Society (APS). The simulation-based approach allows for a parametric analysis of how different factors influence metrics such as the h-index and journal impact factors. The results confirm known correlations but also reveal how these metrics can be influenced. For instance, expanding team sizes without adding new authors (a structural network change) can artificially inflate the h-index. This study demonstrates the power of coevolutionary models to understand the mechanisms behind widely used scientific metrics, highlighting their potential vulnerabilities and the profound impact of network dynamics on measures of scientific success.

Beyond network analysis, methodological refinement in natural language processing continues to yield more granular insights from text. Paper 8 explores tourist experiences by analyzing online reviews from TripAdvisor. This research moves beyond the document-level sentiment analysis seen in the neurology study to a much finer-grained approach: Aspect-Based Sentiment Analysis (ABSA). By combining ABSA with Zero-Shot Learning (ZSL), the framework can identify and assess sentiments toward specific aspects of a service (e.g., “food,” “accommodation,” “cultural experiences”) without requiring extensive, manually annotated datasets for every aspect. The model uses pretrained models like RoBERTa and keyword extraction techniques (KeyBERT) to dissect reviews from hospitality services in Central Java, Indonesia. This work presents a highly practical application of advanced NLP, providing businesses with detailed, actionable feedback on their specific strengths and weaknesses, far exceeding the utility of a simple positive or negative overall rating.

While the previous studies focus on analyzing primary data from social media or reviews, other research advances methodology by synthesizing the existing academic literature in novel ways. A bibliometric analysis presented in paper 1 maps the conceptual structure of “thriving at work” as a growing concept in psychology and business. This study uses the Web of Science (WoS) database and the VOSviewer software to conduct a science mapping. This is, in effect, an analysis of the conceptual network of a scientific field. The analysis identifies the most influential authors, the most cited articles and journals, and the primary countries contributing to the construct. More importantly, co-citation and co-occurrence analyses reveal the thematic clusters that define the field, identifying key research streams related to well-being, performance, leadership, and psychological safety. This bibliometric approach provides a meta-view of a research domain, outlining its intellectual structure and developmental path.

Finally, at the most foundational level of measurement, methodological innovations are addressing long-standing challenges in statistical data modeling. Paper 2 proposes a Regularized Generalized Logistic Item Response Model (IRT). This work tackles a fundamental problem in psychometrics and other fields that rely on dichotomous or polytomous variables (e.g., item responses, surveys): standard logistic and probit link functions are symmetric, which may not be appropriate for all data. The proposed model introduces an asymmetric generalized logistic link function, which can more flexibly model data. To manage the complexity of this new model, the paper employs regularized estimation (e.g., ridge-type penalties) to stabilize the item-specific parameter estimates. The usefulness of this advanced statistical model is demonstrated through both simulations and empirical examples using PISA data. This study represents a core methodological contribution, refining the fundamental statistical tools used to measure latent traits and model probabilistic outcomes.

The ten contributions in this Special Issue demonstrate the importance of data science and network science for computational social science. They illustrate a clear methodological progression from static, uni-modal analyses toward dynamic, multimodal, and coevolutionary frameworks. While these articles provide significant advances and tackle several of the challenges outlined, they also highlight the vast landscape of open questions that remain in the field. It is my hope that these contributions will serve as a foundation and inspiration for many future high-impact papers that continue to advance this dynamic area of research.

Funding

This work was supported by Instituto Kunumi. The author thanks the institution for its financial support and commitment to advancing scientific research.

Acknowledgments

As Guest Editor of the Special Issue “Advances in Data and Network Sciences Applied to Computational Social Science”, the author gratefully acknowledges the contributors whose valuable work made this edition a success. The author also expresses appreciation for the great work of all the reviewers who helped to improve the quality of the manuscripts. During the preparation of this manuscript, the author used GenAI for the purposes of text summarization and proofreading. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest.

References

Kelleher, J.; Tierney, B. Data Science; The MIT Press Essential Knowledge Series; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer Texts in Statistics; Springer International Publishing: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Zhang, A.; Lipton, Z.; Li, M.; Smola, A. Dive into Deep Learning; Cambridge University Press: Cambridge, MA, USA, 2023. [Google Scholar]
Barabási, A.L. Network Science; Cambridge University Press: Cambridge, MA, USA, 2016. [Google Scholar]
Izenman, A. Network Models for Data Science; Cambridge University Press: Cambridge, MA, USA, 2023. [Google Scholar]
Corso, G.; Stark, H.; Jegelka, S.; Jaakkola, T.; Barzilay, R. Graph neural networks. Nat. Rev. Methods Prim. 2024, 4, 17. [Google Scholar] [CrossRef]
Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabási, A.L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; et al. Computational social science. Science 2009, 323, 721–723. [Google Scholar] [CrossRef] [PubMed]
Hofman, J.M.; Watts, D.J.; Athey, S.; Garip, F.; Griffiths, T.L.; Kleinberg, J.; Margetts, H.; Mullainathan, S.; Salganik, M.J.; Vazire, S.; et al. Integrating explanation and prediction in computational social science. Nature 2021, 595, 181–188. [Google Scholar] [CrossRef] [PubMed]
Ziems, C.; Held, W.; Shaikh, O.; Chen, J.; Zhang, Z.; Yang, D. Can Large Language Models Transform Computational Social Science? Comput. Linguist. 2024, 50, 237–291. [Google Scholar] [CrossRef]
Lazer, D.M.J.; Pentland, A.; Watts, D.J.; Aral, S.; Athey, S.; Contractor, N.; Freelon, D.; Gonzalez-Bailon, S.; King, G.; Margetts, H.; et al. Computational social science: Obstacles and opportunities. Science 2020, 369, 1060–1062. [Google Scholar] [CrossRef] [PubMed]
Kaddour, J.; Lynch, A.; Liu, Q.; Kusner, M.J.; Silva, R. Causal Machine Learning: A Survey and Open Problems. Found. Trends Optim. 2025, 9, 1–247. [Google Scholar] [CrossRef]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 2021, 54, 115. [Google Scholar] [CrossRef]
Thapa, S.; Shiwakoti, S.; Shah, S.B.; Adhikari, S.; Veeramani, H.; Nasim, M.; Naseem, U. Large language models (LLM) in computational social science: Prospects, current state, and challenges. Soc. Netw. Anal. Min. 2025, 15, 4. [Google Scholar] [CrossRef]

Table 1. Overview of contributions.

Paper	Domain/Topic	Methodology	Datasets	Main Contribution
1	Scientometrics/ Organizational Psychology	Bibliometric Analysis using co-citation and co-occurrence analysis	Web of Science (WoS)	Mapped the conceptual network and intellectual structure of a research field, identifying key authors, thematic clusters, and developmental trajectories.
2	Psychometrics/ Statistical Modeling	Regularized Generalized Logistic Item Response Model (IRT)	Simulation and PISA data	Proposed an asymmetric logistic link function for IRT models to better fit real-world data, using regularized estimation to stabilize parameters.
3	Decentralized Autonomous Organizations (DAOs)	Big Data Analytics, Text Mining, Latent Dirichlet Allocation (LDA) Topic Modeling	Twitter and Reddit	Identified key topics and the centrality of community in public discourse on DAOs.
4	Geopolitical Conflict/Infoveillance (Russia–Ukraine War)	Unsupervised Graph-Based Topic Tracking with a “biological metaphor”	Twitter	Developed a dynamic system to track emerging topics in real-time, capturing the life-cycle of narratives during a crisis.
5	Organizational Communication/Email Analysis	Multimodal Classifier (EMMA), RoBERTa and social network analysis	Corporate Email Dataset	Showed that integrating a sender’s social network context improves email reply prediction accuracy.
6	Public Health/Social Perception (Neurological Disorders)	Topic Modeling and Unsupervised Sentiment Analysis	Twitter	Mapped the “digital neurolandscape,” finding dementia is the most discussed disorder, and identified dominant public emotions of fear, anger, and sadness.
7	Environmental Awareness (Mangrove)	SBERT embeddings + K-Means Clustering	Twitter	Proposed a more semantically precise method for topic clustering.
8	Hospitality and Tourism/Customer Experience	Aspect-Based Sentiment Analysis (ABSA) with Zero-Shot Learning (ZSL) and KeyBERT	TripAdvisor Reviews	Developed a granular method to extract sentiment on specific service aspects without needing aspect-specific training data.
9	Scientometrics/ Network Science	Coevolutionary Model (based on preferential attachment) and Simulation	APS coauthorship and citation data	Modeled the coevolution of coauthorship and citation networks, and demonstrated how structural changes can artificially inflate metrics like the h-index.
10	Network Science/Graph Analysis	Temporal Attributed Graph Exploration (TempoGRAPHer)	Contact/co-authorship networks	Provided a framework and toolkit for analyzing the evolution of complex networks at various granularities.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ferreira, L.N. Advances in Data and Network Sciences Applied to Computational Social Science. Information 2025, 16, 1032. https://doi.org/10.3390/info16121032

AMA Style

Ferreira LN. Advances in Data and Network Sciences Applied to Computational Social Science. Information. 2025; 16(12):1032. https://doi.org/10.3390/info16121032

Chicago/Turabian Style

Ferreira, Leonardo N. 2025. "Advances in Data and Network Sciences Applied to Computational Social Science" Information 16, no. 12: 1032. https://doi.org/10.3390/info16121032

APA Style

Ferreira, L. N. (2025). Advances in Data and Network Sciences Applied to Computational Social Science. Information, 16(12), 1032. https://doi.org/10.3390/info16121032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advances in Data and Network Sciences Applied to Computational Social Science

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI