Data Science, a cornerstone of modern research, focuses on extracting knowledge and insights from vast and often unstructured datasets [
1,
2,
3]. Network Science offers a robust theoretical and methodological framework for understanding the complex connections within large-scale systems. Modeling these systems as nodes and edges allows for the analysis of relationships, flows, and structural influence [
4,
5,
6]. The combination of these fields provides powerful new capabilities to analyze, model, and predict complex phenomena. This integration is especially important for Computational Social Science, a field that uses computational power and massive data streams to study social phenomena on an unprecedented scale, moving far beyond the limitations of traditional survey methods [
7,
8,
9]. Using tools from both data and network sciences, researchers can now investigate large-scale patterns in human behavior—such as information diffusion, community formation, and opinion dynamics—with a detail and scope that were previously unattainable.
However, despite these advances, significant methodological gaps and open questions remain [
10]. A large portion of the current research still relies on static analyses of social phenomena, while the inherently dynamic nature of human interaction demands a much greater focus on temporal and dynamical models. Furthermore, while complex computational models achieve high predictive accuracy, they often function as “black boxes”, creating a central challenge in moving beyond correlation to establish robust causal analysis [
11]. Foundational open questions regarding data bias, algorithmic fairness, and ethical concerns also require persistent examination [
12]. Finally, while the recent emergence of large language models (LLMs) offers transformative potential, scientists have only begun to explore their capabilities and limitations within the specific theoretical and methodological contexts of computational social science [
13].
Addressing these complex issues and advancing the frontiers of this growing field requires a coordinated, interdisciplinary effort that bridges computational rigor with deep social scientific theory. This Special Issue, titled “Advances in Data and Network Sciences Applied to Computational Social Science” presents cutting-edge research that directly addresses these open questions. The collection is designed to highlight the progress at the intersection of these disciplines by presenting novel methodologies and innovative applications to pressing social problems. The articles presented here offer new perspectives that integrate computational findings with established social theories. The contributions in this issue collectively demonstrate the deep value of these integrated computational approaches for understanding fundamental questions about human society, collective behavior, and the complex dynamics of modern social systems.
This paper synthesizes and examines the following papers published in this Special Issue (ordered by date of publication):
De Santis, E.; Martino, A.; Ronci, F.; Rizzi, A. An Unsupervised Graph-Based Approach for Detecting Relevant Topics: A Case Study on the Italian Twitter Cohort during the Russia–Ukraine Conflict.
Information 2023,
14, 330.
https://doi.org/10.3390/info14060330Shah, H.; Jaidka, K.; Ungar, L.; Fagan, J.; Grosser, T. Building a Multimodal Classifier of Email Behavior: Towards a Social Network Understanding of Organizational Communication.
Information 2023,
14, 661.
https://doi.org/10.3390/info14120661Domingo-Espiñeira, J.; Fraile-Martínez, O.; Garcia-Montero, C.; Montero, M.; Varaona, A.; Lara-Abelenda, F.J.; Ortega, M.A.; Alvarez-Mon, M.; Alvarez-Mon, M.A. Navigating the Digital Neurolandscape: Analyzing the Social Perception of and Sentiments Regarding Neurological Disorders through Topic Modeling and Unsupervised Research Using Twitter.
Information 2024,
15, 152.
https://doi.org/10.3390/info15030152Kusumaningrum, R.; Khoerunnisa, S.F.; Khadijah, K.; Syafrudin, M. Exploring Community Awareness of Mangrove Ecosystem Preservation through Sentence-BERT and K-Means Clustering.
Information 2024,
15, 165.
https://doi.org/10.3390/info15030165Nawawi, I.; Ilmawan, K.F.; Maarif, M.R.; Syafrudin, M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement.
Information 2024,
15, 499.
https://doi.org/10.3390/info15080499Xue, H. Analysis of Effects on Scientific Impact Indicators Based on Coevolution of Coauthorship and Citation Networks.
Information 2024,
15, 597.
https://doi.org/10.3390/info15100597
This collection of studies provides a comprehensive overview of the advanced methodological frontiers in information analysis.
Table 1 presents an overview of the contributions. A clear trend emerges: the move from static, isolated, and text-only models to frameworks that are dynamic, multimodal, context-aware, and deeply integrated with network science. The research ranges from applying topic modeling and sentiment analysis to vast social media datasets to understand public awareness and perception (papers 3, 6 and 7) to developing advanced systems for tracking the dynamic emergence of topics in real-time (paper 4). This temporal focus is also reproduced in the development of systems for exploring the evolution of graph structures directly (paper 10). Furthermore, the studies demonstrate a move toward integration, where multimodal models (paper 5) combine linguistic data with social network features to achieve superior predictive power. This network-centric view is also seen in simulation models that analyze the coevolution of authorship and citation networks to explain the behavior of scientific metrics (paper 9). In parallel, NLP techniques are becoming more granular, moving from document-level sentiment to aspect-based analysis (paper 8), while bibliometric methods map the conceptual networks of science itself (paper 1). Finally, foundational statistical models are being improved to better capture the complexities of real-world data, such as asymmetry (paper 2). Together, these works exemplify the importance of a research field that is rapidly evolving to meet the challenge of understanding our complex, interconnected, and data-rich world.
One of the most prominent areas for this methodological innovation is the analysis of large-scale social media data. Researchers are increasingly leveraging platforms like Twitter and Reddit as vast, real-time repositories of public opinion and behavior. Paper 3, for instance, performs a trend analysis of Decentralized Autonomous Organizations (DAOs) using big data analytics. Recognizing DAOs as a rapidly emerging model for decentralized governance, the study applies text mining and Latent Dirichlet Allocation (LDA) topic modeling to a massive dataset of tweets containing the “#DAO” hashtag and Reddit posts mentioning “DAO”. This approach allows for an objective, data-driven content analysis, revealing the key themes dominating the public conversation. The findings identify dominant topics related to finance, gaming, and fundraising, and note the common appearance of the term “community” highlighting the social and collaborative dimensions of this new organizational form. This work exemplifies the power of topic modeling to extract coherent themes from millions of user-generated posts, providing a clear map of a new technological and social phenomenon.
While traditional LDA remains a robust tool for topic identification, its reliance on bag-of-words models can sometimes fail to capture the deeper semantic nuances of language. Addressing this limitation, paper 7, proposes a novel technique for exploring community awareness, applied to the context of mangrove ecosystem preservation. This research explicitly positions itself as an alternative to LDA, which can be resource-intensive and struggle with slight semantic differences. The proposed method instead employs a combination of Sentence-BERT (SBERT) for generating semantically meaningful sentence embeddings and K-Means clustering for topic identification. By analyzing Indonesian-language Twitter data related to mangroves, this SBERT-based approach successfully identifies nine distinct topics and visualizes tweet frequency to reveal a growing public awareness and collaborative efforts between government and society. This study represents a methodological refinement, presenting how modern transformer-based embeddings can provide a more semantically precise foundation for topic clustering than traditional probabilistic models.
Social media analysis was also applied in the public health domain, as demonstrated in a comprehensive study in paper 6 on the social perception of neurological disorders. This work analyzes an extensive collection of Twitter data spanning from 2007 to 2023 to navigate the “digital neurolandscape.” Using a combination of topic modeling and unsupervised sentiment analysis, the research maps the public discourse surrounding conditions such as dementia, epilepsy, multiple sclerosis, and Parkinson’s disease. The results reveal that dementias are by far the most discussed neurological disorders. The analysis uncovers key themes, including the profound impact of these diseases on patients and relatives, public appeals for increased awareness and research funding, and discussions of potential treatments. Furthermore, the sentiment analysis associates these topics with dominant emotions of fear, anger, and sadness, balanced by positive emotions like joy, likely tied to research breakthroughs or support networks. This study transforms the Twitter platform into a large-scale sociological sensor, capturing public sentiment and concern regarding a critical global health challenge.
The previous studies mainly focus on a snapshot of topics and sentiments (static) rather than capturing their evolution in real-time. For rapidly unfolding events, such as geopolitical conflicts, a dynamic approach is essential. A study presented in paper 4 addresses this exact challenge by presenting an unsupervised, graph-based approach for detecting relevant topics in the Italian Twitter cohort during the 2022 Russia–Ukraine conflict. The core innovation is a sophisticated topic tracking system framed within a “biological metaphor.” In this model, words and topics possess “vitality” and receive “nourishment” from social interactions, subject to content aging. This unsupervised system moves beyond static LDA to identify emerging topics by monitoring term energy and co-occurrence. By applying this graph-based model, the study highlights the main events and their connections as they emerged, demonstrating a promising new method for real-time monitoring than can capture the birth, growth, and decay of narratives during a crisis.
The focus on dynamic, temporal evolution is not limited to text but extends to the analysis of network structures themselves. Paper 10 introduces TempoGRAPHer, a system explicitly designed for the aggregation-based exploration of temporal attributed graphs. This work addresses the challenge of understanding how complex networks, such as social or cooperation networks, evolve. TempoGRAPHer supports both temporal and attribute-based aggregation, allowing researchers to view network evolution at different granularities. The system features two complementary exploration strategies. First, a skyline-based approach identifies overall trends and dominant patterns. Second, an interaction-based approach allows for closer examination of specific periods of change. This framework provides a robust vocabulary and computational toolkit for identifying and analyzing periods of growth, shrinkage, or stability within complex evolving networks, moving the analysis from a series of static snapshots to a continuous temporal landscape.
Where the TempoGRAPHer framework provides a method for exploring graph evolution, other research investigates the predictive power of network structures when combined with other data modalities. Paper 5 tackles the complex problem of modeling organizational communication by building a multimodal classifier of email behavior. This work introduces the Email MultiModal Architecture (EMMA), designed to predict the probability of receiving an email response. The key methodological leap is the integration of social network features with traditional linguistic analysis. The model does not just analyze the content of an email (using RoBERTa embeddings) but also incorporates data reflecting the sender’s context within the organization, including their social network influence (e.g., centrality) and personal likability. EMMA demonstrates an improved prediction accuracy of up to 12.5% compared to leading text-centric models. This powerfully illustrates that in organizational settings, communication outcomes are not solely dependent on what is said, but critically, on who is saying it and their position within the social fabric.
The critical role of network structure and its coevolution is further exemplified in the field of scientometrics. Paper 9 analyzes the effects of the coevolution of coauthorship and citation networks on scientific impact indicators. Using a preferential attachment mechanism, this research develops a model that integrates these two evolving networks and validates it against a large dataset from the American Physical Society (APS). The simulation-based approach allows for a parametric analysis of how different factors influence metrics such as the h-index and journal impact factors. The results confirm known correlations but also reveal how these metrics can be influenced. For instance, expanding team sizes without adding new authors (a structural network change) can artificially inflate the h-index. This study demonstrates the power of coevolutionary models to understand the mechanisms behind widely used scientific metrics, highlighting their potential vulnerabilities and the profound impact of network dynamics on measures of scientific success.
Beyond network analysis, methodological refinement in natural language processing continues to yield more granular insights from text. Paper 8 explores tourist experiences by analyzing online reviews from TripAdvisor. This research moves beyond the document-level sentiment analysis seen in the neurology study to a much finer-grained approach: Aspect-Based Sentiment Analysis (ABSA). By combining ABSA with Zero-Shot Learning (ZSL), the framework can identify and assess sentiments toward specific aspects of a service (e.g., “food,” “accommodation,” “cultural experiences”) without requiring extensive, manually annotated datasets for every aspect. The model uses pretrained models like RoBERTa and keyword extraction techniques (KeyBERT) to dissect reviews from hospitality services in Central Java, Indonesia. This work presents a highly practical application of advanced NLP, providing businesses with detailed, actionable feedback on their specific strengths and weaknesses, far exceeding the utility of a simple positive or negative overall rating.
While the previous studies focus on analyzing primary data from social media or reviews, other research advances methodology by synthesizing the existing academic literature in novel ways. A bibliometric analysis presented in paper 1 maps the conceptual structure of “thriving at work” as a growing concept in psychology and business. This study uses the Web of Science (WoS) database and the VOSviewer software to conduct a science mapping. This is, in effect, an analysis of the conceptual network of a scientific field. The analysis identifies the most influential authors, the most cited articles and journals, and the primary countries contributing to the construct. More importantly, co-citation and co-occurrence analyses reveal the thematic clusters that define the field, identifying key research streams related to well-being, performance, leadership, and psychological safety. This bibliometric approach provides a meta-view of a research domain, outlining its intellectual structure and developmental path.
Finally, at the most foundational level of measurement, methodological innovations are addressing long-standing challenges in statistical data modeling. Paper 2 proposes a Regularized Generalized Logistic Item Response Model (IRT). This work tackles a fundamental problem in psychometrics and other fields that rely on dichotomous or polytomous variables (e.g., item responses, surveys): standard logistic and probit link functions are symmetric, which may not be appropriate for all data. The proposed model introduces an asymmetric generalized logistic link function, which can more flexibly model data. To manage the complexity of this new model, the paper employs regularized estimation (e.g., ridge-type penalties) to stabilize the item-specific parameter estimates. The usefulness of this advanced statistical model is demonstrated through both simulations and empirical examples using PISA data. This study represents a core methodological contribution, refining the fundamental statistical tools used to measure latent traits and model probabilistic outcomes.
The ten contributions in this Special Issue demonstrate the importance of data science and network science for computational social science. They illustrate a clear methodological progression from static, uni-modal analyses toward dynamic, multimodal, and coevolutionary frameworks. While these articles provide significant advances and tackle several of the challenges outlined, they also highlight the vast landscape of open questions that remain in the field. It is my hope that these contributions will serve as a foundation and inspiration for many future high-impact papers that continue to advance this dynamic area of research.