Review Reports - Social-LLM: Modeling User Behavior at Scale Using Language Models and Social Network Data

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposed Social-LLM, a scalable social network representation model that combines user content cues with social network cues for inductive user detection tasks, and conducted a thorough evaluation of Social-LLMs on 7 real-world large-scale social media datasets of various topics and detection tasks to show the utility of using Social-LLM embeddings for visualization.

However, the following issues need to be addressed.

(1) The proposed Social-LLM in Section 3 lacks crucial implementation details, such as batch size, number of epochs, learning rate, how to implement round-robin training, and early stopping criteria, etc.

(2) This paper should provide clear pseudocode and list main hyperparameters for learning to materially improve reproducibility.

(3) The paper also asserts the time complexity of step 1 is O(|E|), and that of step 2 is even quicker at O(|V|) in Social-LLM, but without giving the experimental evaluation on these theoretical results. This paper should includes representative runtime and memory analysis for at least one large dataset, and makes a comparison against a main baseline such as TIMME.

(4) How is the scalability of the Social LM model proposed in the paper reflected? This paper lacks a summary and analysis of the latest relevant work in 2024 and 2025, as well as corresponding citations of the latest literature.

Author Response

please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript, titled “Social-LLM: Modeling User Behavior at Scale Using Language Models and Social Network Data”, proposes the Social-LLM framework to integrate large language models (LLMs) with social network cues for user detection tasks. The framework combines user content (e.g., profile descriptions, tweets) with network information (e.g., retweets, mentions), and is evaluated on seven real-world datasets across domains such as political partisanship, morality, account suspension, and toxicity. The authors demonstrate that Social-LLM is scalable, inductive, and effective, often outperforming baseline methods. The contribution is promising, yet the manuscript’s structure and theoretical framing require significant improvement before publication. My main comments are as follows:

In Related Work, the review is too thin and does not sufficiently contextualize the integration of LLMs into social network modeling. While the section reviews prior approaches in user detection and graph representation learning, it does not explain why LLMs are necessary here, nor what unique advantages they bring compared to existing methods. What specific shortcomings of prior graph neural networks, hybrid models (e.g., TIMME, GEM), or content-only approaches are uniquely addressed by LLMs? 3.4 Advantages and Disadvantages provides strong rationale but appears misplaced. Much of this content should be incorporated into the related work and introduction to better highlight the necessity of the Social-LLM framework.

Methods requires clearer organization and logic. It is confusing why “cues” (which are features) are presented on the same level as the full “framework.” Conceptually, cues are components feeding into the framework, not equivalent subsections. The current structure makes the logical flow difficult to follow. I recommend reorganizing this section under a unified heading (e.g., 3. Social-LLM Framework) with content cues and network cues integrated as its two foundational elements. This would improve readability and coherence, as the present structure gives the impression of three unrelated pillars rather than a unified model.

In Data, the descriptions of datasets are detailed, but they lack structural clarity. Each subsection mixes dataset description, preprocessing, and hints of evaluation usage. This leaves the reader uncertain about the exact role of each dataset in the study. I recommend restructuring this part to highlight the objectives explicitly. For each dataset, specify: (1) the research question/task (e.g., partisanship classification, morality regression), (2) the dataset source and preprocessing steps, and (3) the role of this dataset in validating Social-LLM. Without such focus, the section remains confusing.

Evaluation describes baselines and experimental setup, but in reality, it functions as an explanation of “how the experiments are performed on the datasets.” This creates redundancy with 4. Data. I suggest merging 4. Data and 5. Evaluation into a single section titled Experiments (or similar) if you test the datasets with difference evaluation setup (Option B mentioned in 5.).

In Results, the use of “Experiment 1,” “Experiment 2,” etc. is confusing in the current structure because “Methods” “Evaluation” adn “Data” are separated. If the experiments are meant to be the core of the empirical study, then they should be integrated more tightly with the methodological description. I recommend two alternative solutions: Option A: Keep Evaluation within Methods, and then in Results describe each experiment systematically (including the dataset, task, and outcome); Option B: Merge Data, Evaluation and Results into a comprehensive Experiments section, where each experiment includes (a) dataset and task, (b) methods compared, and (c) results. Either option would reduce redundancy and make the results easier to interpret.

Figure 2 is abrupt and insufficiently motivated. It introduces “bot scores” without clear linkage to the central narrative of the paper, leaving readers uncertain about where these scores come from, why bots are relevant to the Social-LLM framework, and what insight the figure adds. While the caption indicates that it describes user filtering in the IMMIGRATION-HATE dataset, the manuscript does not sufficiently justify why this dataset in particular requires a visual illustration of bot score distribution. Given the otherwise text-based descriptions of datasets in Section 4, Figure 2 appears unnecessary and disrupts the flow.

The manuscript repeatedly claims that Social-LLM reduces computational complexity compared to GNN-based methods (see 4, Advantages and Disadvantages). However, this claim is not substantiated in the experimental sections. While the authors argue that their framework avoids full graph training, the reliance on large LLMs raises doubts about overall efficiency, since fine-tuning or embedding extraction from LLMs can itself be computationally expensive.

Author Response

please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The goal of the paper is to develop a framework for modeling user behavior in social networks by integrating user-generated content, processed through LLMs with localized social network interactions, leveraging the principle of homophily to overcome the computational limitations of traditional graph-based methods. The topic is highly relevant and original. The references are comprehensive and relevant to the fields of computational social science, NLP, and network science. The writing is clear, and the paper is well-structured. The authors show that Social-LLM outperforms or matches state-of-the-art baselines across multiple tasks and datasets.

Minor comments
1. Table 2 reports average performance scores but omits measures of variance (e.g., standard deviation) across random splits. This makes it difficult to assess the statistical stability and reliability of the results.

2. Figure 3 seems a bit hard-to-read. Maybe it needs some additional explanations.

3. In Table 1, it would be helpful to add a brief footnote or a column indicating the source of the ground-truth labels for each dataset.

Author Response

please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This revision has addressed my concerns.