From Mobile Media to Generative AI: The Evolutionary Logic of Computational Social Science Across Data, Methods, and Theory

Hua Li; Qifang Wang; Ye Wu

doi:10.3390/math13193062

,

and

¹

School of Journalism and Communication, Beijing Normal University, Beijing 100875, China

²

Center for Computational Communication Research, Beijing Normal University, Zhuhai 519087, China

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(19), 3062;https://doi.org/10.3390/math13193062

This article belongs to the Special Issue Mathematical Models and Methods in Computational Social Science

Version Notes

Order Reprints

Abstract

Since its articulation in 2009, Computational Social Science (CSS) has grown into a mature interdisciplinary paradigm, shaped first by mobile media-generated digital traces and more recently by generative AI. With over a decade of development, CSS has expanded its scope across data, methods, and theory: data sources have evolved from mobile traces to multimodal records; methods have diversified from surveys and experiments to agent-based modeling, network analysis, and computer vision; and theory has advanced by revisiting classical questions and modeling emergent digital phenomena. Generative AI further enhances CSS through scalable annotation, experimental design, and simulation, while raising challenges of validity, reproducibility, and ethics. The evolutionary logic of CSS lies in coupling theory, models, and data, balancing innovation with normative safeguards to build cumulative knowledge and support responsible digital governance.

Keywords:

mathematical modeling; computational social science; generative AI; integration of theory and method

MSC:

68T01; 68U01; 91D30

1. Introduction

Computational Social Science (CSS) was formally introduced in 2009 as an interdisciplinary paradigm combining computational advances with the theoretical concerns of the social sciences. Its early growth coincided with the rise in mobile media, which transformed everyday interactions into streams of digital traces, enabling the large-scale observation of human behavior. Over the following decade, CSS matured into a consolidated field, marked by dedicated conferences, handbooks, and journals, and distinguished by its dual commitment to causal explanation and predictive modeling.

By the 2020s, the advent of generative AI has further expanded CSS’s methodological repertoire, offering tools for text annotation, survey augmentation, experimental design, and simulation. Yet these advances do not displace established methods; rather, they underscore the enduring value of theory-driven inquiry, empirical validation, and interpretability. Generative AI is thus best understood as complementing, not replacing, the established CSS toolkit.

This review traces CSS’s trajectory from its origins in mobile media data to its current integration with generative AI. It examines the field across three dimensions—data, methods, and theory—highlighting milestones, assessing methodological innovations, and mapping pathways of theoretical advancement. In doing so, it emphasizes CSS’s normative significance as a research paradigm that not only explains and predicts social phenomena but also contributes to the responsible governance of digital society.

2. Emergence and Development of Computational Social Science

CSS emerged in the early twenty-first century alongside the rise in big data as an interdisciplinary domain. In contrast to traditional social science approaches that rely primarily on surveys and small-scale experiments, CSS emphasizes the study of social phenomena through massive behavioral traces generated in the digital era and through advanced computational techniques. A landmark of this paradigm shift was the Science article entitled “Computational Social Science,” published in 2009 by scholars from sociology, computer science, physics, and other fields, which signaled the formal consolidation of this area: leveraging large-scale data collection and analytical capacities to reveal patterns of individual and collective behavior [1]. In this “declaration of emergence,” the authors articulated the potential of computational social science to transform our understanding of life, organizations, and society, while also identifying institutional and data-related obstacles to its development. This milestone work alerted the community that the continuous flows of data from social media, mobile communication, and online transactions provide unprecedented breadth, depth, and timeliness for social inquiry, thereby complementing sample surveys and catalyzing a methodological transformation. Building on this conceptual and methodological foundation, scholarly communities and training pipelines expanded rapidly, creating organizational and institutional supports for the field.

2.1. Institutional Landscape and Academic Programs

Since 2009, CSS has grown swiftly into a cross-disciplinary focal point. Universities and research institutes worldwide established CSS centers and degree programs to cultivate talent and methodological expertise. For example, in 2016 the University of Chicago launched the program Masters in Computational Social Science to address the increasing need to embed digital technologies within social-science research.

In 2015, the International Conference on Computational Social Science (IC2S2) was founded as an interdisciplinary venue for scholars to exchange the latest quantitative advances in studying social systems and dynamics using large-scale datasets. The inaugural meeting took place in 2015 in Helsinki, Finland; the conference has been held annually since, reaching its 11th edition by 2025. IC2S2 has become one of the most influential meetings at the intersection of social and computational sciences, bringing together scholars across sociology, economics, political science, psychology, cognitive science, management, computer science, statistics, and the natural and applied sciences, with a shared commitment to exploring the social world through large-scale data and computation [2].

Another highly influential initiative in computational social science is The Summer Institutes in Computational Social Science (SICSS). Established in 2017 by Chris Bail and Matt Salganik, SICSS aims to train the next generation of researchers at the interface of social and data sciences and to advance the study of human behavior in the digital age. Since 2017, the institutes have raised more than $1.5 million to support workshops at 53 locations worldwide, attracting over 2200 participants from 500 universities and 150 fields; participants have created more than 100 projects [3]. As these organizational networks expanded, textbooks, handbooks, and journal platforms simultaneously matured, offering systematic knowledge frameworks and sustained publication outlets.

2.2. Canonical Works and Publication Outlets

“Introduction to Computational Social Science” [4], Author is Claudio Cioffi-Revilla. It presents core concepts in computational social science, including formal definitions and a glossary; delineates the scope of subfields such as information extraction, social networks, complexity theory, and social simulation; and discusses methodological tools including entity extraction from text, computation of social-network indices, and the construction of agent-based models.

Matthew J. Salganik, “Bit by Bit: Social Research in the Digital Age” [5]. It has been described as “An innovative and accessible guide to doing social research in the digital age”. While preserving that characterization, the book further explains—through authoritative yet accessible exposition—how the digital revolution is reshaping the ways social scientists observe behavior, formulate questions, conduct experiments, and organize mass collaborations; it offers numerous real-world examples and practical guidance for addressing difficult ethical challenges, serving as an essential resource for research in a rapidly evolving landscape.

Handbook of Computational Social Science [6]. This two-volume reference provides a comprehensive resource across disciplines, maps key debates in the field, showcases new statistical modeling and machine-learning methods, and uses case studies to illustrate both opportunities and challenges inherent in CSS approaches.

Beginning in 2018, Springer launched the Journal of Computational Social Science with a mission to examine social and economic phenomena or structures through large-scale data, simulation studies, and related computational methodologies [7]. EPJ Data Science opened a Topical Collection on The Past, Present, and Future of Computational Social Science [8]. Collectively, these publication venues have played important roles by applying computational methods to large datasets to generate distinctive, policy-relevant insights for society [9](p. 1). With publishing infrastructures in place, CSS topics have also become deeply coupled with traditional disciplines, gradually forming a multi-disciplinary knowledge map.

2.3. Interdisciplinary Integration and Knowledge Network

These developments have fostered sustained cross-fertilization among disciplines. In communication studies, “computational communication” has become one of the most active CSS subfields, employing automated content analysis, network analysis, and computational simulation to address fundamental questions about human behavior, interaction, and communication [10]; in political science, “computational politics” investigates opinion dynamics, election forecasting, and political polarization through networked data [11]; in sociology, “computational sociology” revisits classic topics such as social capital and mobility using complex-network analysis and computational models [12]; computer science, for its part, has developed “social computing,” focusing on user behavior and algorithmic influence on platforms.

As a result, CSS has consolidated into a liberal-arts-and-sciences community, with disciplines becoming ever more intertwined. Edelmann et al. [13] conducted a bibliometric review of CSS within sociology and concluded that CSS is diffusing rapidly across numerous sociological subfields, with frequent inter-disciplinary interactions. In a citation network built from 379 articles, 24 scholarly areas were identified. Four principal clusters dominate the network core: the first connects communication, sociology, and political science; the second centers on geography and communication; the third tightly links business and library science; and the fourth bridges business, finance, and law.

As this knowledge network expands and topics become more finely articulated, CSS has entered a phase of institutionalization and normalization, showing clear signs of maturation.

2.4. A Decade of Consolidation

For a cross-disciplinary field such as CSS to take root, the requisite institutional infrastructures took shape over the past decade. A decade after the original “declaration” article, Lazer et al. [14]—again in Science—reviewed the progress achieved: (1) thousands of papers employing observational data, experimental designs, and large-scale simulations have substantially advanced understanding of major social phenomena; (2) academic institutions supporting CSS have grown markedly; and (3) a scholarly community composed of social scientists, computer scientists, and statistical physicists has cohered under the CSS umbrella. They also provided a definition of CSS: “the development and application of computational methods to complex, typically large-scale, human (sometimes simulated) behavioral data”, and noted the knowledge background spanning spatial data, social networks, and human-coded text and images. At the same time, they summarized three obstacles facing CSS—misalignment of universities, inadequate data and inadequate rules—and proposed strengthening collaboration, building new data infrastructures, and developing ethical guidelines.

In sum, CSS has evolved into a stable community with an emergent knowledge system: it retains the social-scientific commitment to explaining human behavior and social mechanisms while introducing computational tools and big data that shift the research lens from “small-N and static analyses” to “data-intensive, computational, and dynamic” inquiry. Today, CSS engages deeply with computer and data sciences across communication, political science, and sociology, and it is flourishing globally. It follows that the rise in CSS is a natural outcome of social-scientific development in the digital age, representing a new paradigm of tight integration between the social sciences and computational/data technologies. Table 1 summarizes key milestones in the emergence and consolidation of computational social science (2009–2025).

Table 1. Milestones in the emergence and consolidation of CSS (2009–2025).

3. Data Collection and Measurement

Research in Computational Social Science (CSS) rests on novel data and computational capacity: at one end are heterogeneous digital traces drawn from multiple sources; at the other are measurement frameworks that convert raw records into interpretable social indicators. To ground the methodological discussion that follows, this section proceeds along three threads—sources of data, quality and accessibility, and transformation into valid measures—with explicit transitions to maintain coherence. It is the rapid expansion of digital data, together with new techniques for analyzing them, that has collectively catalyzed CSS as a new interdisciplinary domain [13].

Building on this foundation, our analysis highlights the evolutionary trajectory of computational social science along its three core dimensions—data, methods, and theory. Our objective is to uncover the evolutionary logic of CSS from the mobile media era of the 2010s to the generative AI era of the 2020s. To provide a comprehensive account, we review literature spanning from the field’s inception in 2009 to the present, while paying special attention to cutting-edge studies from 2023 to 2025 that illuminate how generative AI is transforming CSS.

3.1. Sources and Scope of Big Data

A clear delineation of “big data” is a necessary starting point. Big data denotes information that is massive in scale (volume), heterogeneous in type (variety), and/or rapidly changing (velocity), and making sense of such material requires the prior development of appropriate tools [15]. Hence, big data calls for CSS.

According to Statista, the worldwide amount of data created, captured, copied, and consumed reached 149 zettabytes in 2024 and is projected to grow to 181 zettabytes by the end of 2025; major contributors include AI-generated content and social-media/user-generated material—platforms such as TikTok, YouTube, and Instagram process billions of uploads daily across high-definition video, images, and interactions—as well as streaming services [16]. These figures indicate that CSS operates within a dynamic ecosystem characterized by sustained growth and expanding dimensionality.

In terms of provenance, big data can be divided into digital life (e.g., behaviors occurring on platforms such as Twitter, Facebook, and Wikipedia), digital traces (e.g., call records), and digitalized life (e.g., mobile devices capturing the proximity of individuals). [17], and these behaviors are largely inseparable from mobile media, implying that modern life is mediated by smartphones and wearables, with mobile media functioning as the default sensor. Since the widespread adoption of smartphones in 2010s, mobile media has become an extension of the human body, gaining prominence due to its distinctive features of on-the-go interaction, ubiquitous access to media and personalized content, and enhanced audience control and immediacy. Entering the 2020s, generative AI has come into the public spotlight, ushering in a new wave of communication technologies and fostering the emergence of AI-enhanced social science [18].

Concretely, social media constitutes a primary source. Beyond that, CSS makes extensive use of mobile location and sensor logs, banking and transactions data, electronic health records, and e-commerce traces. Artificial intelligence and its embedded interfaces—such as smart speaker assistants (SSAs)—are becoming important gateways: as Google Home and Amazon Echo (Alexa) increasingly mediate activities from e-commerce to information seeking, they continuously generate high-frequency behavioral data [19]. As society evolves toward sensor-dense computational environments, smartphones, smart offices, and smart-city devices will become core data sources [9](p. 3). Video data are likewise a crucial research resource; their role in measurement and inference is elaborated in the subsequent section on computational vision.

Having established where data originate, we next address whether—and under what conditions—these data are usable, which hinges on quality criteria and institutional access.

3.2. The “V” Dimensions of Big Data and Accessibility

Industry practice commonly summarizes big data via the 4Vs: volume, velocity, variety, and veracity [20]; building on this, Sloan & Quan-Haase [21] add virtue (ethics) and value (knowledge gain). Further extensions—Viscosity, Variability, Volatility, and Viability—yield a “10Vs” characterization; these attributes simultaneously delineate the boundaries of usability (quality, sensitivity, compliance) and signal engineering complexity in linkage, cleaning, and version control [22]. Collectively, the “V” profile raises the bar for data collection and statistical analysis; accordingly, statistical modeling, advanced statistical modeling, and machine learning methods furnish essential analytic benchmarks for progress in statistics and computation [9](pp. 11–12).

Nevertheless, structural barriers persist: data siloed within private firms, non-disclosure agreements (NDAs) that impede sharing among researchers, and shifting platform API policies that restrict—or at times eliminate—access together pose systemic challenges to the research enterprise [23] (p. 626). Consequently, auditable access mechanisms and responsible data governance have become institutional prerequisites for the sustainable development of CSS.

With the boundaries of quality and access clarified, the final step is to show how raw traces become scientifically meaningful—i.e., how data are transformed into measures.

3.3. Digital Traces and the Transformation to Measurement

Salganik identifies ten common characteristics of big data and organizes them into two broad groups:

useful attributes for research: large-scale, continuously generated, and nonreactive
problematic attributes for research: incomplete, difficult to access, unrepresentative, nonstationary, influenced by platform algorithms, noisy, and sensitive [5] (p. 17).

The core challenge, then, is converting non-research data into meaningful measurement for CSS—balancing concept-to-indicator mapping, control of algorithmic interference, representativeness, and ethics. To this end, Lazer et al. [24] propose an agenda of “meaningful measures” for twenty-first-century society: couple theory-guided measurement with multimethod triangulation, attend to representativeness and ethics, and innovate measurement strategies that keep pace with social change. Elmer [25] further suggests CSS should embrace the potential of combining both passive and active measurement practices to capitalize on the strengths of each approach, that is, integrate passive traces with active acquisition (surveys/experiments) to enhance internal validity and external generalizability.

Data are only the starting point. To move from “recordable behavior” to “interpretable mechanism”, one must develop a problem-aligned methodological repertoire and analysis pipeline.

4. Analytical Methods

A defining difference between today’s networked communication era and earlier periods is that the very technologies used to transmit information can also be harnessed to analyze its dynamics and effects. Such analyses rely on observational data generated in situ and on novel measurement instruments catalyzed by the digital revolution [23] (p. 621). Against this backdrop, the methodological repertoire of CSS has expanded from observation to intervention and from unimodal to multimodal designs, yielding a layered framework centered on text analytics, experimental designs, survey integration, agent-based modeling, and computer vision.

An expanded toolkit is now available for interrogating large, complex datasets, including diverse forms of automated text analysis, online field experiments, mass-collaboration workflows, and many additional approaches inspired by machine learning [5,26,27].

With the advent of large language models, these methods have been further upgraded: contemporary LLMs can bolster the CSS workflow in two main ways—(1) acting as zero-shot annotators alongside human coding teams and (2) jump-starting difficult generative tasks [28]. Specifically, LLM-powered approaches—such as content analysis, survey augmentation, and simulation-based experimentation—do not supplant conventional methods; rather, their versatility creates new opportunities, including the generation of experimental stimuli, the simulation of survey responses, the interpretation of open-ended data, and the facilitation of dialogic interactions [18].

4.1. Text Analysis with LLM Support

Analytical techniques for textual data constitute a core pillar of CSS. Social media posts, online reviews, and news reports embed rich signals of opinion and attitude, and NLP methods can automatically extract such patterns. For instance, sentiment analysis differentiates positive from negative tone to track public emotion over events; topic models uncover latent themes in large text corpora. A representative application uses Twitter mood to anticipate economic and social indicators: Bollen et al. [29] analyze time series of tweet sentiment and show robust associations with stock-market indices. Hopkins and King [30] develop quantitative text-analysis strategies to extract political-attitude signals from government documents and news coverage. In advertising, Barari et al. [31] survey computational content analysis—covering object detection, topic modeling, and sentiment analysis—and illustrate applications such as brand/logo identification in images and emotion classification in images and video. Collectively, these studies demonstrate that NLP enables CSS to handle large-scale text previously infeasible for systematic analysis, yielding micro-level insight into public psychology and discourse. Generative AI further scales both efficiency and scope.

Building on this, Gilardi et al. [32] show that ChatGPT-3.5 Turbo can outperform crowd workers on several annotation tasks, including relevance, stance, topic, and frame detection. At the same time, Ziems et al. [28] emphasize that effective LLM use still requires some human oversight and task-specific prompt engineering. To overcome the traditional “classification-but-not-positioning” limitation in LLM applications, Le Mens & Gallego [33] propose an asking-and-averaging approach to position political texts: the best models achieve correlations above 0.90 with benchmarks derived from experts, crowd coders, or roll-call votes, and often surpass supervised classifiers trained on large research datasets. Synthesizing current evidence, Bail [34] argues that text analysis is among the most promising avenues for generative AI to improve social-science research, while noting that LLMs still fall short of expert human coders and are better viewed as augmenters rather than replacements in the near term.

When textual evidence points to putative mechanisms, causal identification requires experimental methods to test interventions and effects.

4.2. Experimental Methods: Online Randomized Trials and At-Scale Interventions

Experiments allow researchers to move beyond correlations in naturally occurring data to make credible claims about cause and effect; in the digital era, many logistical constraints have receded [5] (p. 148).

Within platform settings, large-scale online experiments markedly enhance external validity and inferential power. In 2012, Bond et al. [35] conducted a canonical CSS study: a randomized controlled trial of voting mobilization on Facebook. During the 2010 U.S. congressional elections, political mobilization messages sent to 61 million users affected millions of people’s political self-expression, information seeking, and real-world turnout. Kramer et al. [36] and Matz et al. [37] run experiments on 0.6 and 3.5 million Facebook users, respectively, to examine emotional contagion and psychologically targeted communication. Outside that platform, randomized trials are equally informative: Zhang et al. [38] test the causal and spillover effects of “engagement bait” on Bilibili with a random sample of 188,249 users and 1,810,787 videos.

Generative AI also shows promise for replicating classic experiments and designing new interventions. Horton [39] argues that synthetic respondents built with GPT-3 can reproduce several canonical findings in behavioral economics. Aher et al. [40] demonstrate that GPT-3 Turbo and GPT-4 can replicate a number of social-psychology experiments, while failing to reproduce “wisdom of crowds” phenomenon. In applied online interventions, Costello et al. [41] engage 2190 conspiracy believers in personalized, evidence-based dialogs with GPT-4 Turbo, reducing conspiracy belief by ~20% with effects persisting for two months and generalizing across topics. Likewise, Argyle et al. [42] show in a large-scale field experiment that LLM-provided, evidence-based, real-time suggestions can improve the quality of conversations on divisive topics without systematically shifting positions.

LLMs facilitate a methodological shift from standardized to personalized experimental stimuli and enable innovative interactive designs, where human–AI collaboration has been shown to transform communication patterns and enhance productivity [43,44].

However, it is important to note that LLM-based simulations are best understood as complementing rather than substituting human experimentation, by enabling efficient hypothesis piloting, counterfactual exploration, and the evaluation of designs across diverse populations [18].

Not all key variables are directly observable via behavioral traces or experiments, which motivates the integration of surveys with passive digital records.

4.3. Surveys Integrated with Digital Traces

Surveys remain indispensable in CSS. In recent years, many studies in communication, sociology, and political science have adopted hybrid “compute + survey” designs that link questionnaires with digital trace data [45]. For example, Guess et al. [46] connect Twitter/Facebook accounts to survey responses to compare self-reports versus observed (passively sensed) political social-media use. Shin [47] extends this strategy to news exposure, revealing systematic differences between stated and actual behavior.

LLMs also provide distinctive value for survey research: Argyle et al. [48] argue that language models can serve as viable proxies for specific human subpopulations; GPT-3 exhibits fine-grained, demographically patterned “algorithmic biases,” and—with proper conditioning—can simulate response distributions for multiple subgroups, with “silicon samples” capturing aspects of human attitudinal complexity. At the same time, boundary conditions remain: LLMs can reproduce presidential-vote patterns but not global-warming opinions unless relevant covariates are included; thus conditioning, model choice, questionnaire format, and bias assessment are crucial when using LLMs for survey simulation [49]. Further evidence indicates that LLMs underperform at individual-level social prediction without shortcut features; current models are not yet suitable for rigorous prediction tasks absent substantial tuning or labeled supervision [50].

When the analytic focus shifts from individual attitudes to emergent group behavior and system dynamics, agent-based modeling offers a bottom-up mechanism-oriented lens.

4.4. Agent-Based Modeling and Generative Agents

Agent-Based Modeling (ABM) emerged to tackle fundamental challenges in social science, where complex processes were often reduced to static equilibria that obscured the link between heterogeneous micro-level interactions and macro-level dynamics [51] (pp. 1–2). Thomas Schelling’s Dynamic Models of Segregation [52] pioneered this approach, showing how simple individual choices can generate polarized outcomes and that aggregate patterns rarely reveal individual motives [51,52] (p. 3). Building on such insights, the development of “artificial societies” through advances in computing offers a simulation-based framework that connects micro behaviors to macro regularities, demonstrating how fundamental social structures emerge from boundedly rational agents’ interactions [51] (pp. 3–4).

Agent-based Modeling simulates complex social processes by specifying simple behavioral rules for many interacting “agents” in virtual environments. It is widely used in CSS to examine evolutionary problems that elude closed-form or purely statistical approaches. Examples include the role of inequality in residential segregation [53], micro-to-macro mechanisms in network dynamics [54]. These simulations enable virtual social experiments to test how macro-patterns emerge under alternative assumptions, while requiring empirical calibration and validation to ensure fidelity.

LLM-driven ABMs employ large language models to construct generative agents that simulate human-like behavior in interactive environments [55]. By integrating mechanisms for memory, reflection, and planning, these agents extend beyond traditional prompt-based approaches to generate coherent long-term actions [55]. When embedded in simulated social systems, they enable the study of emergent dynamics—such as relationship formation, and collective coordination—thus providing a novel paradigm for advancing computational social science and prototyping social theories [28,55].

Recent work suggests that generative-AI tools can simulate large populations, enriching the ABM paradigm and alleviating some limitations of traditional simulations. Park et al. [55] build a system of multiple GPT-3.5-turbo agents interacting in a fictional town, where agents form daily routines and display emergent collective properties. Other research indicates that LLMs can reproduce social-movement dynamics observed on social media [56].

Still, integrating LLMs into ABMs raises methodological debates—e.g., how much agent complexity is necessary and what evaluation criteria should be used—and addressing these issues could open new directions [34].

Beyond text and behavioral sequences, vast troves of visual content carry critical signals, motivating computer vision as a key methodological pillar in CSS.

4.5. Computer Vision and Multimodal Analysis

Computer vision seeks deeper computational understanding of digital images and video, enabling the analysis of large-scale visual media.

In political communication, researchers use emotion-recognition tools to quantify politicians’ facial expressions and personalization strategies [57]. In visual esthetics, studies quantify how the esthetics and calorie density of Instagram food images jointly shape popularity [58] and predict brand personality from image features [59]. To bridge gaps between tool outputs and theoretical constructs in media-effects research, scholars advocate supervised models for targeted attributes and unsupervised methods to discover meaningful visual themes; future work can integrate large language models and multimodal pipelines to extract higher-level insights [60]. Overall, computer vision is poised to become a critical instrument for empirical social research across subfields, yet current practice still touches only the “tip of the iceberg” of automated video analysis; in application, researchers must address data relevance, the fit between pretrained models and research questions, and privacy/compliance constraints [61](pp. 395–396).

In sum, different computational methods in CSS bring distinctive strengths and limitations. LLM-assisted text coding enables rapid, low-cost labeling that can rival crowd work, yet risks misrepresenting minority identities and requires transparent disclosure; it should be used to augment, not replace, expert coding. Experiments offer clear causal identification and show promise with AI assistance in improving discourse, but face challenges of external validity and platform shifts, underscoring the need to prioritize theory-linked outcomes and long-term effects. Survey–trace designs best link attitudes to behavior for mechanism testing, though they remain vulnerable to consent biases and linkage errors, calling for integration with careful measurement diagnostics [62,63]. ABM explains macro-level outcomes from micro rules and supports counterfactuals, but struggles with calibration and parameter fragility; its value lies in mechanism-focused modeling combined with empirical calibration and falsification [64,65,66]. Computer vision scales consistent visual features and supports ad or imagery audits, yet suffers from dataset shifts and construct validity issues, making theory-driven codebooks and transparent replication essential. Table 2 summarizes observations and method-specific critiques.

Table 2. Observations and critique by method.

5. Method or Theory?

The debate over whether CSS is “method-first and theory-lacking” has a long history. Some scholars argue that researchers often fail to connect patterns identified in digital trace data to the core questions of the social sciences, or to develop new theories that account for online communication and politics [68]. As Borsboom et al. [69] put it, “no amount of empirical data can fill a theoretical gap”. In this sense, reflecting on the relation between “method and theory” is not mere rhetoric; it concerns the fundamental pathway of cumulative knowledge.

Further critiques target practice: researchers may lean too heavily on readily quantifiable online indicators [70] or privilege mathematical modeling [71], thereby diluting the steering role of theoretical frameworks. Echoing these concerns—yet going beyond them—Cioffi-Revilla, in discussing The nature of CSS, argues that computational social science does not only analyze big data and data-mining algorithms but must—and should—also encompass theory, models, simulation, and other scientific constructs; computational techniques thus play a dual and decisive role: they function both as a theoretical paradigm and as a methodological instrument [9](pp. 17–28). In other words, the CSS ecology is jointly constituted by theory, models, and data (including both “small data” and “big data”), and method and theory are not substitutes but mutually enabling.

On this view, CSS clearly does more than emphasize tools; it also carries a substantive theoretical commitment. Concretely, theory-centered knowledge production in CSS manifests in at least four interlinked pathways [72] (pp. 113–114).

First, large-scale tests of existing theories. For example, in network theory, the “structural holes” thesis holds that actors bridging otherwise unconnected alters gain more opportunities [73]. An analysis of the nationwide call network in the United Kingdom by Eagle et al. [74] showed that individuals spanning structural holes were more likely to reside in higher socioeconomic status areas, offering the first group-level corroboration of the theory. Likewise, Spiro [75] translated the theory of social convergence during offline crises into online settings and found stronger effects of attention concentration and crowd clustering on the internet during crises, thereby establishing the theory’s applicability in digital environments. Through this pathway, CSS furnishes a broader empirical base that exposes classic theories to denser data and new contextual constraints. Meng et al., using data from 7.45 billion social media users, resolved long-standing debates in information-spreading theory. They identified a ubiquitous mechanism of social reinforcement and weakening, formalized in a succinct equation; the resulting model accurately captures complex retweeting dynamics and explains weak-tie effects. Taken together, these advances demonstrate that large-scale, data-driven computational social science provides powerful evidence for both the validation and the advancement of theoretical assumptions [76].

Second, extensions of existing theories that yield finer-grained insight. Drawing on five years of activity data for millions of individuals, Aral and Nicolaides [77] found that in social contagion people tend to “align” with slightly less active peers rather than emulate the most active ones, thereby refining how the social comparison mechanism [78] operates. Spaiser et al. [79] embedded Castell’s theory of “communication power” into a concrete political context, offering a new frame for interpreting online political conflict. Such work is not mere replication; it pushes theory to generate higher-resolution propositions within digital ecologies.

Third, theory generation from novel data. Combining fMRI with other multimodal measurements, Schmälzle et al. [80] reported that individuals embedded in friendship networks where friends are mutually connected face a lower likelihood of social exclusion than those whose friends are not connected to one another; this finding points to the promise of developing physiologically grounded (or at least physiologically coupled) theories of the antecedents and consequences of social networks. More broadly, as computable text, behavioral traces, and increasingly, model-generated data enter the picture, CSS opens a new “observable layer” for theorizing, expanding the testable frontier of theoretical models. This expansion is exemplified by instruction-tuned LLMs, which can reliably generate accurate estimates of ideological positions from a wide range of political texts, even multilingual ones. By converting raw text into precise ‘model-generated data’ on ideology and policy stances, LLMs provide a novel observable layer that substantially broadens the empirical foundation for the development and testing of social theories [33].

Fourth, new theorizing around emergent phenomena. CSS has rekindled discussion of the “effervescence of collective behavior” [81] and deepened our understanding of the dynamics of the “small-world” phenomenon in complex social systems [82]. As social behavior is recorded and reconstructed with higher temporal, relational, and content resolution, new regularities and mechanism-focused propositions come into view. New theorizing, exemplified by growth-induced percolation, reveals how indirect social interactions generate emergent phenomena. This model shows how a small number of active nodes can trigger widespread activation in complex networks, uncovering distinct phase transitions. It provides a foundational framework for understanding social contagion and collective behavior. Theoretical advances centered on emergent phenomena promise more precise predictive models for understanding the complexity of the social world [83].

With respect to disciplinary positioning, computational social science can be characterized as an interdisciplinary endeavor that advances explanatory accounts of human behavior by leveraging computational techniques on large-scale datasets sourced from social media, the Internet, and other digitized repositories such as administrative records. The review places sociological theory at the core, contending that the future of the field within sociology depends not only on novel data and methods, but also on the capacity to generate new theories of human behavior or to improve existing explanations of the social world [13]. In line with this view, Shugars [84] urges CSS to return to the essence of the social sciences: genuine CSS should not rely solely on computational techniques but ought to be integrated with social-science theory to uncover the social mechanisms underlying data and method. Mattsson [85] expresses optimism, arguing that computational social science is increasingly prioritizing methodological validation and the robustness of findings, signaling a shift beyond exploration toward deeper integration with social-science theory.

Pathways for theoretical advancement in computational social science include reexamining classic sociological problems with novel data and methods, theorizing the new social domains shaped by digital technologies, and innovating approaches to theory construction through computational tools [13].

How to couple theory and methods? As an emerging branch of computational social science, sociophysics has shown both the potential and the limits of mathematically modeling social phenomena, revealing a persistent misalignment between modeling assumptions and social reality. Social processes should be understood not as neutral counterparts of stochastic physical laws but as components of emergent and dynamically shifting structures [86].

To address this gap, we propose a Theory–Model–Data (TMD) bridge that links theories across the social sciences to computational models and empirical evidence. Substantive propositions (e.g., social comparison, structural holes, innovation search) are translated into testable rules within evolving structures through agent-based, network-analytic, and related frameworks. Model observables—such as polarization, consensus time, clustering, and diffusion speed—are mapped to auditable indicators in digital traces and hybrid survey–trace designs. This pipeline reframes the method–theory tension by positioning computation as infrastructure with a dual role: a paradigm for mechanism articulation and an instrument for falsification and prediction, thereby advancing cumulative social-science knowledge.

In sum, the dichotomy of “method or theory” is untenable. A more constructive path is to treat computation as an infrastructure with a dual identity—as a theoretical paradigm and as a methodological tool: theory articulates meaningful questions, data and algorithms furnish falsifiable evidence and mechanism-oriented accounts, and iterative movement between the two yields cumulative social-science knowledge.

6. Conclusions

Big data and algorithmic techniques have not only expanded the horizons of social inquiry but also prompted the sustained scrutiny of their societal consequences and ethical boundaries. As a practice of “peering into society through data,” computational social science inherently entails the collection and analysis of individual- and group-level behavioral data, and both methodological choices and research findings may carry social ramifications. Consequently, the scholarly community has engaged in multi-dimensional reflection on this field, centering on data privacy, algorithmic transparency, behavioral surveillance, and data colonialism, while continually calibrating the public responsibilities of research practice within the normative frameworks of what is “feasible” and what is “appropriate”.

Building on this agenda of reflection, scholars have further clarified the scope and core components of the discipline. Cioffi-Revilla specifies the scope of CSS as “theoretically and methodologically guided by theory, enriched by analytical models, and enabled by computer simulations; all three drawing on empirical data, be it big or small”, including five areas: computational foundations, algorithmic information extraction, networks, social complexity, and social simulations [9](pp. 17–28).

Accordingly, an appropriate positioning of CSS should address both its disciplinary locus and its research aims: first, CSS is an interdisciplinary domain connecting data science and the social sciences; second, it pursues causal inference and predictive inference as dual, coequal objectives, advancing cumulative knowledge through the joint development of theory, models, and evidence. Its roots trace back to mathematical modeling and social simulation in sociology; with the comprehensive digitization of social life, CSS has evolved into a dynamically developing scientific arena [87] (p. 131). Hofman and Watts et al. [88] likewise contend that computational social science is not merely an assemblage of large-scale digital data and associated techniques, but also a substantive convergence of scientific reasoning and practice across disciplines; accordingly, they advocate tighter integration of prediction and explanation.

Turning to the rapidly emerging family of large models, opportunities and challenges arise in tandem. On the one hand, generative AI is expected to enhance human-behavior research workflows such as surveys, online experiments, automated content analysis, and agent-based modeling; on the other hand, concerns about biases in training data, ethical compliance, reproducibility, environmental impacts, and the proliferation of low-quality studies cannot be ignored. In response, some scholars propose that social scientists develop open-source infrastructure for human-behavior research to mitigate limitations and institutionalize verifiable processes [34].

At the same time, the role and limits of large models within CSS are coming into sharper focus. Ziems et al. [28] conclude that LLMs possess genuine potential to collaborate with researchers and to contribute meaningfully to social-scientific analysis, yet a more realistic outlook is augmentation rather than substitution—they can streamline conventional pipelines but cannot fully replace them.

In sum, the next stage of CSS requires consolidating ethical and societal safeguards while forging clearer consensus on disciplinary scope and research objectives: anchoring inquiry on the twin pillars of causality and prediction; iteratively coupling theory, models, and data; and, under large model–driven conditions, prioritizing verifiability and interpretability in methodological innovation so as to avoid technological fetishes and tool dependency. In this way, the healthy development of computational social science depends on a dynamic balance between innovation and regulation—fully leveraging the scientific potential of big data and computational methods, while, under the premise of respecting and protecting individuals, pursuing deeper and more cumulative explanations of social phenomena.

Building on these methodological imperatives, it is equally crucial to examine how CSS applications intersect with core societal challenges, particularly the resilience and transformation of democratic processes in the digital age.

As both a theoretical lens and a methodological toolkit, computational social science (CSS) offers critical insights into contemporary democratic dynamics. Empirical evidence shows, for instance, that social media enables political campaigns to reach specific voter segments with unprecedented precision, and that the intensity of Facebook political advertising significantly shaped voter mobilization and persuasion during the 2016 U.S. presidential election, thereby affecting the functioning of democratic institutions [89]. Likewise, research demonstrates that highly central bots in social networks accelerate the spread of misinformation and exacerbate political polarization [90]. These findings resonate with broader concerns about democratic futures, including social fragmentation, polarization, algorithmic radicalization, rabbit hole dynamics, and the collapse of “shared reality”. Looking ahead, the rise in autonomous AI agents introduces both opportunities and risks: in networked environments, such agents could circulate information and knowledge, semantically process data, and autonomously execute complex tasks or entire projects with minimal human intervention. While this may enhance efficiency, it also risks undermining human agency, raising pressing questions about values, community formation, and the long-term sustainability of democratic governance.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (No. 1233200017 and No. 1233300003), and Beijing Social Science Foundation (No. 21DTR040).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABM	Agent-based Modeling
AI	Artificial intelligence
API	Application Programming Interface
CC BY	Creative Commons Attribution
CSS	Computational Social Science
EMNLP	Conference on Empirical Methods in Natural Language Processing
EPJ	European Physical Journal
GPT	Generative Pre-trained Transformer
IC2S2	International Conference on Computational Social Science
LLM	Large Language Model
NLP	Natural Language Processing
SICSS	The Summer Institutes in Computational Social Science
SSAs	Smart speaker assistants
TMD	Theory-Model-Data

References

Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabási, A.-L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; et al. Computational Social Science. Science 2009, 323, 721–723. [Google Scholar] [CrossRef]
11th International Conference on Computational Social Science (IC2S2 2025). Available online: https://www.ic2s2-2025.org/ (accessed on 15 September 2025).
The Summer Institutes in Computational Social Science. Available online: https://sicss.io/about (accessed on 15 September 2025).
Cioffi-Revilla, C. Introduction to Computational Social Science: Principles and Applications; Springer: London, UK, 2014; ISBN 978-1-4471-5661-1. [Google Scholar]
Salganik, M.J. Bit by Bit: Social Research in the Digital Age; Princeton University Press: Princeton, NJ, USA, 2017; ISBN 978-0-691-15864-8. [Google Scholar]
Engel, U.; Quan-Haase, A.; Liu, S.X.; Lyberg, L. (Eds.) Handbook of Computational Social Science; Routledge: London, UK, 2022; ISBN 978-1-032-11143-8. [Google Scholar]
Journal of Computational Social Science—Aims and Scope. Available online: https://link.springer.com/journal/42001/aims-and-scope (accessed on 15 September 2025).
EPJ Data Science Topical Collection: The Past, Present, and Future of Computational Social Science. Available online: https://epjdatascience.springeropen.com/ppfcss (accessed on 15 September 2025).
Engel, U.; Quan-Haase, A.; Liu, S.X.; Lyberg, L. (Eds.) Handbook of Computational Social Science, Volume 1: Theory, Case Studies and Ethics; Routledge: London, UK, 2021; ISBN 978-1-003-02458-3. [Google Scholar] [CrossRef]
Geise, S.; Waldherr, A. Computational Communication Science: Lessons from Working Group Sessions with Experts of an Emerging Research Field. In Handbook of Computational Social Science. Volume 1: Theory, Case Studies and Ethics; Routledge: London, UK, 2021. [Google Scholar] [CrossRef]
Bail, C.A.; Argyle, L.P.; Brown, T.W.; Bumpus, J.P.; Chen, H.; Hunzaker, M.B.F.; Lee, J.; Mann, M.; Merhout, F.; Volfovsky, A. Exposure to Opposing Views on Social Media Can Increase Political Polarization. Proc. Natl. Acad. Sci. USA 2018, 115, 9216–9221. [Google Scholar] [CrossRef]
Ellison, N.B.; Gray, R.; Lampe, C.; Fiore, A.T. Social Capital and Resource Requests on Facebook. New Media Soc. 2014, 16, 1104–1121. [Google Scholar] [CrossRef]
Edelmann, A.; Wolff, T.; Montagne, D.; Bail, C.A. Computational Social Science and Sociology. Annu. Rev. Sociol. 2020, 46, 61–81. [Google Scholar] [CrossRef]
Lazer, D.M.J.; Pentland, A.; Watts, D.J.; Aral, S.; Athey, S.; Contractor, N.; Freelon, D.; Gonzalez-Bailon, S.; King, G.; Margetts, H.; et al. Computational Social Science: Obstacles and Opportunities. Science 2020, 369, 1060–1062. [Google Scholar] [CrossRef]
Monroe, B.L. The Five Vs of Big Data Political Science: Introduction to the Virtual Issue on Big Data in Political Science. Political Anal. 2013, 21, 1–9. [Google Scholar] [CrossRef]
Bartley, K. How Much Data Is in the World? [2024/2025 Big Data Statistics]. Available online: https://rivery.io/blog/big-data-statistics-how-much-data-is-there-in-the-world/ (accessed on 15 September 2025).
Lazer, D.; Radford, J. Data Ex Machina: Introduction to Big Data. Annu. Rev. Sociol. 2017, 43, 19–39. [Google Scholar] [CrossRef]
Peng, T.-Q.; Yang, X. Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods. arXiv 2025, arXiv:2505.19402. [Google Scholar] [CrossRef]
Brause, S.R.; Blank, G. Externalized Domestication: Smart Speaker Assistants, Networks and Domestication Theory. Inf. Commun. Soc. 2020, 23, 751–763. [Google Scholar] [CrossRef]
IBM What Is Big Data? Available online: https://www.ibm.com/think/topics/big-data (accessed on 15 September 2025).
Sloan, L.; Quan-Haase, A. The SAGE Handbook of Social Media Research Methods; SAGE Publications: Thousand Oaks, CA, USA, 2016; ISBN 978-1-4739-8384-7. [Google Scholar] [CrossRef]
Acharjya, D.P.; Ahmed, P.K. A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 511–518. [Google Scholar] [CrossRef]
Welles, B.F.; González-Bailón, S. (Eds.) The Oxford Handbook of Networked Communication; Oxford University Press: New York, NY, USA, 2020; ISBN 978-0-19-046051-8. [Google Scholar] [CrossRef]
Lazer, D.; Hargittai, E.; Freelon, D.; Gonzalez-Bailon, S.; Munger, K.; Ognyanova, K.; Radford, J. Meaningful Measures of Human Society in the Twenty-First Century. Nature 2021, 595, 189–196. [Google Scholar] [CrossRef]
Elmer, T. Computational Social Science Is Growing up: Why Puberty Consists of Embracing Measurement Validation, Theory Development, and Open Science Practices. EPJ Data Sci. 2023, 12, 58. [Google Scholar] [CrossRef]
Evans, J.A.; Aceves, P. Machine Translation: Mining Text for Social Theory. Annu. Rev. Sociol. 2016, 42, 21–50. [Google Scholar] [CrossRef]
Molina, M.; Garip, F. Machine Learning for Sociology. Annu. Rev. Sociol. 2019, 45, 27–45. [Google Scholar] [CrossRef]
Ziems, C.; Held, W.; Shaikh, O.; Chen, J.; Zhang, Z.; Yang, D. Can Large Language Models Transform Computational Social Science? Comput. Linguist. 2024, 50, 237–291. [Google Scholar] [CrossRef]
Bollen, J.; Mao, H.; Zeng, X. Twitter Mood Predicts the Stock Market. J. Comput. Sci. 2011, 2, 1–8. [Google Scholar] [CrossRef]
Hopkins, D.J.; King, G. A Method of Automated Nonparametric Content Analysis for Social Science. Am. J. Political Sci. 2010, 54, 229–247. [Google Scholar] [CrossRef]
Barari, M.; Eisend, M. Computational Content Analysis in Advertising Research. J. Advert. 2024, 53, 681–699. [Google Scholar] [CrossRef]
Gilardi, F.; Alizadeh, M.; Kubli, M. ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks. Proc. Natl. Acad. Sci. USA 2023, 120, e2305016120. [Google Scholar] [CrossRef]
Le Mens, G.; Gallego, A. Positioning Political Texts with Large Language Models by Asking and Averaging. Political Anal. 2025, 33, 274–282. [Google Scholar] [CrossRef]
Bail, C.A. Can Generative AI Improve Social Science? Proc. Natl. Acad. Sci. USA 2024, 121, e2314021121. [Google Scholar] [CrossRef]
Bond, R.M.; Fariss, C.J.; Jones, J.J.; Kramer, A.D.; Marlow, C.; Settle, J.E.; Fowler, J.H. A 61-Million-Person Experiment in Social Influence and Political Mobilization. Nature 2012, 489, 295–298. [Google Scholar] [CrossRef]
Kramer, A.D.; Guillory, J.E.; Hancock, J.T. Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks. Proc. Natl. Acad. Sci. USA 2014, 111, 8788–8790. [Google Scholar] [CrossRef]
Matz, S.C.; Kosinski, M.; Nave, G.; Stillwell, D.J. Psychological Targeting as an Effective Approach to Digital Mass Persuasion. Proc. Natl. Acad. Sci. USA 2017, 114, 12714–12719. [Google Scholar] [CrossRef]
Zhang, W.J.; Yi, J.; Liang, H. I Cue You Liking Me: Causal and Spillover Effects of Technological Engagement Bait. Comput. Hum. Behav. 2023, 148, 107864. [Google Scholar] [CrossRef]
Horton, J.J. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? National Bureau of Economic Research: Cambridge, MA, USA, 2023. [Google Scholar] [CrossRef]
Aher, G.; Arriaga, R.I.; Kalai, A.T. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Costello, T.H.; Pennycook, G.; Rand, D.G. Durably Reducing Conspiracy Beliefs through Dialogues with AI. Science 2024, 385, eadq1814. [Google Scholar] [CrossRef]
Argyle, L.P.; Bail, C.A.; Busby, E.C.; Gubler, J.R.; Howe, T.; Rytting, C.; Sorensen, T.; Wingate, D. Leveraging AI for Democratic Discourse: Chat Interventions Can Improve Online Political Conversations at Scale. Proc. Natl. Acad. Sci. USA 2023, 120, e2311627120. [Google Scholar] [CrossRef]
Hackenburg, K.; Margetts, H. Evaluating the Persuasive Influence of Political Microtargeting with Large Language Models. Proc. Natl. Acad. Sci. USA 2024, 121, e2403116121. [Google Scholar] [CrossRef]
Ju, H.; Aral, S. Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance. arXiv 2025, arXiv:2503.18238. [Google Scholar] [CrossRef]
Stier, S.; Breuer, J.; Siegers, P.; Thorson, K. Integrating Survey Data and Digital Trace Data: Key Issues in Developing an Emerging Field. Soc. Sci. Comput. Rev. 2019, 38, 503–516. [Google Scholar] [CrossRef]
Guess, A.; Munger, K.; Nagler, J.; Tucker, J. How Accurate Are Survey Responses on Social Media and Politics? Political Commun. 2019, 36, 241–258. [Google Scholar] [CrossRef]
Shin, J. How Do Partisans Consume News on Social Media? A Comparison of Self-Reports with Digital Trace Measures Among Twitter Users. Soc. Media Soc. 2020, 6, 2056305120981039. [Google Scholar] [CrossRef]
Argyle, L.P.; Busby, E.C.; Fulda, N.; Gubler, J.R.; Rytting, C.; Wingate, D. Out of One, Many: Using Language Models to Simulate Human Samples. Political Anal. 2023, 31, 337–351. [Google Scholar] [CrossRef]
Lee, S.; Peng, T.Q.; Goldberg, M.H.; Rosenthal, S.A.; Kotcher, J.E.; Maibach, E.W.; Leiserowitz, A. Can Large Language Models Estimate Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias. PLoS Clim. 2024, 3, e0000429. [Google Scholar] [CrossRef]
Yang, K.; Li, H.; Wen, H.; Peng, T.Q.; Tang, J.; Liu, H. Are Large Language Models (LLMs) Good Social Predictors? In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 2718–2730. [Google Scholar] [CrossRef]
Epstein, J.M.; Axtell, R. Growing Artificial Societies: Social Science from the Bottom Up; MIT Press: Cambridge, MA, USA, 1996; ISBN 978-0-262-27236-0. [Google Scholar] [CrossRef]
Schelling, T.C. Dynamic Models of Segregation. J. Math. Sociol. 1971, 1, 143–186. [Google Scholar] [CrossRef]
Bruch, E.E. How Population Structure Shapes Neighborhood Segregation. Am. J. Sociol. 2014, 119, 1221–1278. [Google Scholar] [CrossRef]
Stadtfeld, C. The Micro–Macro Link in Social Networks. In Emerging Trends in the Social and Behavioral Sciences; Wiley: Hoboken, NJ, USA, 2018; pp. 1–15. [Google Scholar] [CrossRef]
Park, J.S.; O’Brien, J.C.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative Agents: Interactive Simulacra of Human Behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), San Francisco, CA, USA, 29 October–1 November 2023; ACM: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
Mou, X.; Wei, Z.; Huang, X. Unveiling the Truth and Facilitating Change: Towards Agent-Based Large-Scale Social Movement Simulation. arXiv 2024, arXiv:2402.16333. [Google Scholar] [CrossRef]
Geise, S.; Maubach, K.; Boettcher Eli, A. Picture Me in Person: Personalization and Emotionalization as Political Campaign Strategies on Social Media in the German Federal Election Period 2021. New Media Soc. 2024, 27, 3745–3769. [Google Scholar] [CrossRef]
Sharma, M.; Peng, Y. How Visual Aesthetics and Calorie Density Predict Food Image Popularity on Instagram: A Computer Vision Analysis. Health Commun. 2023, 39, 577–591. [Google Scholar] [CrossRef]
Karpenka, L.; Rudienė, E.; Morkunas, M.; Volkov, A. The Influence of a Brand’s Visual Content on Consumer Trust in Social Media Community Groups. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 2424–2441. [Google Scholar] [CrossRef]
Peng, Y.; Lock, I.; Ali Salah, A. Automated Visual Analysis for the Study of Social Media Effects: Opportunities, Approaches, and Challenges. Commun. Methods Meas. 2024, 18, 163–185. [Google Scholar] [CrossRef]
Engel, U.; Quan-Haase, A.; Liu, S.X.; Lyberg, L. (Eds.) Handbook of Computational Social Science, Volume 2: Data Science, Statistical Modelling, and Machine Learning Methods; Routledge: London, UK, 2021; ISBN 978-1-003-02524-5. [Google Scholar] [CrossRef]
Wenz, A.; Keusch, F.; Bach, R.L. Measuring Smartphone Use: Survey Versus Digital Behavioral Data. Soc. Sci. Comput. Rev. 2025, 43, 1030–1049. [Google Scholar] [CrossRef]
Cernat, A.; Keusch, F.; Bach, R.L.; Pankowska, P.K. Estimating Measurement Quality in Digital Trace Data and Surveys Using the MultiTrait MultiMethod Model. Soc. Sci. Comput. Rev. 2025, 43, 1013–1029. [Google Scholar] [CrossRef]
Collins, A.J.; Koehler, M.; Lynch, C.J. Methods That Support the Validation of Agent-Based Models: An Overview and Discussion. J. Artif. Soc. Soc. Simul. 2024, 27, 11. [Google Scholar] [CrossRef]
Grimm, V.; Berger, U.; Meyer, M.; Lorscheid, I. Theory for and from Agent-Based Modelling: Insights from a Virtual Special Issue and a Vision. Environ. Model. Softw. 2024, 178, 106088. [Google Scholar] [CrossRef]
Dyer, J.S.; Cannon, P.; Farmer, J.D.; Schmon, S.M. Black-Box Bayesian Inference for Agent-Based Models. J. Econ. Dyn. Control 2024, 161, 104827. [Google Scholar] [CrossRef]
Li, H.; Zhang, N. Computer Vision Models for Image Analysis in Advertising Research. J. Advert. 2024, 53, 771–790. [Google Scholar] [CrossRef]
Jungherr, A.; Theocharis, Y. The Empiricist’s Challenge: Asking Meaningful Questions in Political Science in the Age of Big Data. J. Inf. Technol. Politics 2017, 14, 97–109. [Google Scholar] [CrossRef]
Borsboom, D.; Mellenbergh, G.J.; van Heerden, J. The Concept of Validity. Psychol. Rev. 2004, 111, 1061–1071. [Google Scholar] [CrossRef]
Jungherr, A. Normalizing Digital Trace Data. In Digital Discussions: How Big Data Informs Political Communication; Stroud, N.J., McGregor, S.C., Eds.; Routledge: New York, NY, USA, 2018; pp. 9–35. [Google Scholar] [CrossRef]
McFarland, D.A.; Lewis, K.; Goldberg, A. Sociology in the Era of Big Data: The Ascent of Forensic Social Science. Am. Sociol. 2016, 47, 12–35. [Google Scholar] [CrossRef]
Contractor, N. How Can Computational Social Science Motivate the Development of Theories, Data, and Methods to Advance Our Understanding of Communication and Organizational Dynamics? In The Oxford Handbook of Networked Communication; Welles, B.F., González-Bailón, S., Eds.; Routledge: New York, NY, USA, 2020; pp. 113–129. [Google Scholar] [CrossRef]
Burt, R.S. Structural Holes: The Social Structure of Competition; Harvard University Press: Cambridge, MA, USA, 1992; ISBN 978-0-674-84371-4. [Google Scholar] [CrossRef]
Eagle, N.; Macy, M.; Claxton, R. Network Diversity and Economic Development. Science 2010, 328, 1029–1031. [Google Scholar] [CrossRef]
Spiro, E.S. Online Communication by Emergency Responders during Crisis Events. In The Oxford Handbook of Networked Communication; Welles, B.F., González-Bailón, S., Eds.; Oxford University Press: New York, NY, USA, 2020; pp. 150–173. [Google Scholar] [CrossRef]
Meng, F.; Xie, J.; Sun, J.; Xu, C.; Zeng, Y.; Wang, X.; Jia, T.; Huang, S.; Deng, Y.; Hu, Y. Spreading Dynamics of Information on Online Social Networks. Proc. Natl. Acad. Sci. USA 2025, 122, e2410227122. [Google Scholar] [CrossRef]
Aral, S.; Nicolaides, C. Exercise Contagion in a Global Social Network. Nat. Commun. 2017, 8, 14753. [Google Scholar] [CrossRef]
Festinger, L. A Theory of Social Comparison Processes. Hum. Relat. 1954, 7, 117–140. [Google Scholar] [CrossRef]
Spaiser, V.; Chadefaux, T.; Donnay, K.; Russmann, F.; Helbing, D. Communication Power Struggles on Social Media: A Case Study of the 2011–12 Russian Protests. J. Inf. Technol. Politics 2017, 14, 132–153. [Google Scholar] [CrossRef]
Schmälzle, R.; O’Donnell, M.B.; Garcia, J.O.; Cascio, C.N.; Bayer, J.; Bassett, D.S.; Vettel, J.M.; Falk, E.B. Brain Connectivity Dynamics during Social Interaction Reflect Social Network Structure. Proc. Natl. Acad. Sci. USA 2017, 114, 5153–5158. [Google Scholar] [CrossRef]
González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication; MIT Press: Cambridge, MA, USA, 2017; ISBN 978-0-262-03707-5. [Google Scholar] [CrossRef]
Watts, D.J. Networks, Dynamics, and the Small-World Phenomenon. Am. J. Sociol. 1999, 105, 493–527. [Google Scholar] [CrossRef]
Sun, H.; Chen, S.; Xie, J.; Hu, Y. Growth-Induced Percolation on Complex Networks. PNAS Nexus 2025, 4, pgaf192. [Google Scholar] [CrossRef]
Shugars, S. Critical Computational Social Science. EPJ Data Sci. 2024, 13, 13. [Google Scholar] [CrossRef]
Mattsson, C.E.S. Computational Social Science with Confidence. EPJ Data Sci. 2024, 13, 3. [Google Scholar] [CrossRef]
Tsintsaris, D.; Tsompanoglou, M.; Ioannidis, E. Dynamics of Social Influence and Knowledge in Networks: Sociophysics Models and Applications in Social Trading, Behavioral Finance and Business. Mathematics 2024, 12, 1141. [Google Scholar] [CrossRef]
Engel, U. Causal and Predictive Modeling in Computational Social Science. In Handbook of Computational Social Science, Volume 1: Theory, Case Studies and Ethics; Engel, U., Quan-Haase, A., Liu, S.X., Lyberg, L., Eds.; Routledge: London, UK, 2022; pp. 131–149. ISBN 978-1-003-02458-3. [Google Scholar] [CrossRef]
Hofman, J.M.; Watts, D.J.; Athey, S.; Garip, F.; Griffiths, T.L.; Kleinberg, J.; Margetts, H.; Mullainathan, S.; Salganik, M.J.; Vazire, S.; et al. Integrating Explanation and Prediction in Computational Social Science. Nature 2021, 595, 181–188. [Google Scholar] [CrossRef] [PubMed]
Liberini, F.; Redoano, M.; Russo, A.; Cuevas, A.; Cuevas, R. Politics in the Facebook Era: Evidence from the 2016 US Presidential Elections. Eur. J. Political Econ. 2025, 87, 102641. [Google Scholar] [CrossRef]
Azzimonti, M.; Fernandes, M. Social Media Networks, Fake News, and Polarization. Eur. J. Political Econ. 2023, 76, 102256. [Google Scholar] [CrossRef]

Table 1. Milestones in the emergence and consolidation of CSS (2009–2025).

Year	Milestone	Why It Matters	Cite
2009	Science published ‘Computational Social Science’	This marked the birth of computational social science	[1]
2014–2022	A series of landmark volumes, including Introduction to Computational Social Science (2014), Bit by Bit: Social Research in the Digital Age (2017), and the Handbook of Computational Social Science (2022), have been successively published.	These three volumes collectively document the emergence of computational social science, offering conceptual foundations, methodological guidance, and comprehensive overviews of the field.	[4,5,6]
2015–2025	In 2015, the International Conference on Computational Social Science (IC2S2) was established, and since its inception, eleven editions have subsequently been convened.	IC2S2 becomes a premier venue bridging social and computational sciences through large-scale data and computation.	[2]
2017–2025	In 2017, the Summer Institutes in Computational Social Science (SICSS) was established. Since then, it supported workshops at 53 locations worldwide.	It has fostered a scholarly community in computational social science, cultivating a large cohort of researchers and facilitating collaborative achievements.	[3]
2020	Science published ‘Computational social science: Obstacles and opportunities’	Reviewing the achievements of computational social sciences over the past decade, the obstacles and solutions faced, as well as the opportunities available	[14]

Table 2. Observations and critique by method.

Method	Where It Excels	Where It Fails	Our Assessment
LLM-assisted text coding	Rapid, low-cost first-pass labels/explanations; can rival typical crowd work	May misportray minority identities; varies by domain; reproducibility hinges on model/prompt disclosure	Augment (do not replace) expert coders; use ask-and-average when appropriate [28,32,33,34].
Experiments	Clear identification; growing evidence of AI aids can improve conversation quality and reduce harmful beliefs	External validity; platform ecology changes; potential spillovers	Prioritize theory-linked outcomes; report artifacts; long-run effects [41,42].
Survey–trace	Links attitudes to behavior; well-suited for mechanism tests	Consent/coverage biases; construct mismatch	Integrate surveys with traces, foregrounding measurement diagnostics [62,63].
ABM	Explains macro from micro; transparent counterfactuals; integrates with traces via calibration	Calibration/identification hard; fragility under parameter change	Use when mechanisms matter; pair with empirical calibration and falsification tests [64,65,66].
Computer vision	Consistent visual features at scale; strong for ad/imagery audits	Dataset shift and construct validity; latent constructs require theory-driven codebooks; there may be social and cultural biases.	Valuable when paired with transparent codebooks and replication packs [58,60,67].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

From Mobile Media to Generative AI: The Evolutionary Logic of Computational Social Science Across Data, Methods, and Theory

Abstract

1. Introduction

2. Emergence and Development of Computational Social Science

2.1. Institutional Landscape and Academic Programs

2.2. Canonical Works and Publication Outlets

2.3. Interdisciplinary Integration and Knowledge Network

2.4. A Decade of Consolidation

3. Data Collection and Measurement

3.1. Sources and Scope of Big Data

3.2. The “V” Dimensions of Big Data and Accessibility

3.3. Digital Traces and the Transformation to Measurement

4. Analytical Methods

4.1. Text Analysis with LLM Support

4.2. Experimental Methods: Online Randomized Trials and At-Scale Interventions

4.3. Surveys Integrated with Digital Traces

4.4. Agent-Based Modeling and Generative Agents

4.5. Computer Vision and Multimodal Analysis

5. Method or Theory?

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics