LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response †
Abstract
1. Introduction
2. Background
2.1. Uncovering Social Ties via LLMs
2.2. Multilayer Network
3. Materials and Methods
3.1. Proposed Methodology
3.2. Multilayer Network Construction Using LLMs
- (a)
- Identity couplings ( ↔ ). For each person v that appears in both and , we add an interlayer edge with weight . No identity coupling is created into layers where the entity does not exist (i.e., we do not create “virtual twins”).
- (b)
- Membership couplings ( ↔ /). For each person v affiliated with laboratory L and (if used) chair H, we add and with weight . Multiple affiliations yield one coupling per affiliation.
- (c)
- Structural couplings ( ↔ ). When a laboratory L and a chair H share at least one member, we add with weight equal to the number of shared members.
3.3. Multilayer Network Analysis
4. Case Study
4.1. Survey Data Extraction
4.2. Multilayer Network of University of Rijeka–FIDIT
5. Discussion
5.1. Feasibility and Validation of the SURVEY2MLN Methodology
5.2. Advantages of the SURVEY2MLN Methodology
5.3. Limitations and Future Directions
- Human-annotated ground-truth validation of both entity extraction and tie inference, with at least two independent annotators, inter-annotator agreement (e.g., Cohen’s ), and standard IR metrics (precision, recall, F1).
- Benchmarking against conventional NLP/SNA baselines, such as NER + co-occurrence/rule-based tie extraction, to quantify the added value of LLM-based extraction.
- Cross-model and prompt-sensitivity validation, i.e., repeating the extraction with alternative LLMs and systematically varied prompt/parameter settings (temperature, top-p) to test robustness.
- External/construct validation via follow-up checks with participants or domain experts (e.g., short confirmatory survey/interviews) to assess whether extracted ties reflect perceived real relations.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| NLP | Natural Language Processing |
| LLM | Large Language Model |
| SNA | Social Network Analysis |
| MLN | Multilayer Network |
Appendix A. Prompt Examples
Appendix A.1. Prompt 1–Research Similarity Network (RSL)
- Prompt 1: System and User Message
- # ===== SYSTEM MESSAGE =====You are a precise research assistant. Your task is toprocess open-ended survey answers row by row and builda weighted, undirected Research Similarity Network (RSL)using simple overlap counts.
- GENERAL PRINCIPLES
- -
- Be conservative: extract only what is explicitlystated in each participant’s responses (no hallucinations).
- -
- Detect ambiguity (e.g., unclear names/terms) and flag it.
- -
- Unify terminology by mapping variants and relatedterms into the same semantic fields/disciplines.Examples:- “NLP”, “text mining”, “sentiment analysis”-> “natural language processing”- “deep learning”, “neural networks”-> “deep learning”- Keep tools and languages canonical(e.g., "python", “r”, “pytorch”).
- -
- Merge research areas, methods, and tools into ONEcombined canonical set of “skills/attributes”.
- -
- Order of mentions is irrelevant.
- -
- Output must be strictly machine-readable CSV.
- INPUT FORMAT (one JSON object per participant)
- {
- “id”: “PROF1”,
- “q1_primary_area”: “...”,
- “q2_goals_methods”: “...”,
- “q3_tools”: “...”
- }
- ROW-WISE PROCESSINGFor each participant:
- (1)
- Extract canonicalized skills/attributes.
- (2)
- Preserve original terms (deduplicated).
- (3)
- Emit QC flags JSON: {“ambiguous_terms”:bool,...}.
- CROSS-ROW EDGE CONSTRUCTION
- -
- For each unordered pair (i, j), compute shared items.
- -
- Edge weight = |shared_items|.
- -
- If weight == 0, omit the edge.
- OUTPUT ARTIFACTS (CSV ONLY)
- (1)
- normalized_profiles.csvColumns: id, canonical_skills, original_terms, qc_flags
- (2)
- rsl_edges.csvColumns: source, target, shared_items, weight
- ORDERING & FORMAT RULES
- -
- Sort normalized_profiles by id ascending.
- -
- Sort rsl_edges by (source, target) ascending.
- -
- No prose, only CSV output.
- # ===== USER MESSAGE =====You will receive the survey as JSON Lines under key “rows”.Please perform all steps and return the two CSVs exactlyas specified (no prose, CSVs only).
- INPUT EXAMPLE:
- {
- “rows”: [
- {
- “id”:“P1”,
- “q1_primary_area”:“Text mining, sentiment analysis”,
- “q2_goals_methods”:“Machine learning, NLP”,
- “q3_tools”:“Python, R”
- },
- {
- “id”:“P2”,
- “q1_primary_area”:“Natural language processing, deep learning”,
- “q2_goals_methods”:“Neural networks, sentiment analysis”,
- “q3_tools”:“Python, TensorFlow”
- }
- ]
- }
- Prompt Explanation The presented prompt defines the rules by which survey responses are automatically processed using a large language model. Its objective is to extract and standardize research areas, methods, and tools from free-text answers and to merge them into a unified set of skills/attributes for each participant. The prompt prescribes counting direct matches when terms are written identically (e.g., “NLP” and “NLP”); if no such direct matches exist, the model attempts to identify synonyms and related expressions within the same semantic or scientific field (e.g., “NLP,” “text mining,” “sentiment analysis” → natural language processing). In addition, the prompt requires the flagging of ambiguous, unclear, or noisy expressions (e.g., overly broad terms, typographical errors, or nonsensical strings), thereby ensuring transparency and enabling manual inspection by the researcher. Based on these canonicalized participant profiles, an undirected weighted Research Similarity Network (RSL) is constructed, where edges are formed when participants share common attributes, and edge weights correspond to the number of such overlaps. The output is standardized in the form of CSV tables (normalized_profiles.csv and rsl_edges.csv), suitable for subsequent social network analysis.
- Example for 5 participantsInput data (survey responses)
- {
- “rows”: [
- {
- “id”: “P1”,
- “q1_primary_area”: “Text mining, sentiment analysis”,
- “q2_goals_methods”: “Machine learning, NLP”,
- “q3_tools”: “Python, R”
- },
- {
- “id”: “P2”,
- “q1_primary_area”: “My primary focus is natural language
- processing and I am particularly interested in deep
- learning methods.”,
- “q2_goals_methods”: “In my research I use neural networks
- to perform sentiment analysis on large collections of
- texts. I also do data analysis when appropriate.”,
- “q3_tools”: “Most often I work with Python, and for modeling
- I rely on TensorFlow.”
- },
- {
- “id”: “P3”,
- “q1_primary_area”: “Education, learning analytics”,
- “q2_goals_methods”: “Social network analysis (SNA) and
- distance learning asdasd”,
- “q3_tools”: “Gephi, Python”
- },
- {
- “id”: “P4”,
- “q1_primary_area”: “Biotechnology, medicine”,
- “q2_goals_methods”: “Data mining, statistics”,
- “q3_tools”: “R, SPSS”
- },
- {
- “id”: “P5”,
- “q1_primary_area”: “Computer networks and network security”,
- “q2_goals_methods”: “We perform traffic analysis and
- intrusion detection (signature-based) and evaluate
- SDN performance.”,
- “q3_tools”: “Wireshark, Mininet, GNS3”
- }
- ]
- }
| Id | Canonical_Skills | Original_Terms | Qc_Flags |
|---|---|---|---|
| P1 | natural language processing; machine learning; sentiment analysis; python; r | text mining, sentiment analysis, machine learning, NLP, Python, R | {} |
| P2 | natural language processing; deep learning; sentiment analysis; python; tensorflow; data analysis | natural language processing, deep learning, neural networks, sentiment analysis, data analysis, Python, TensorFlow | {“ambiguous_terms”: true, “notes”: “term ‘data analysis’ too broad”} |
| P3 | education; learning analytics; social network analysis; distance learning; python; gephi | education, learning analytics, Social network analysis, distance learning, asdasd, Python, Gephi | {“typos_or_noise”: true, “notes”: “term ‘distnace learnign’ corrected to ‘distance learning’; ‘asdasd’ marked as noise”} |
| P4 | biotechnology; medicine; data mining; statistics; r; spss | biotechnology, medicine, data mining, statistics, R, SPSS | {} |
| P5 | computer networks; network security; traffic analysis; intrusion detection; sdn; wireshark; mininet; gns3 | computer networks, network security, traffic analysis, intrusion detection, SDN, Wireshark, Mininet, GNS3 | {} |
| Source | Target | Shared_Items | Weight |
|---|---|---|---|
| P1 | P2 | natural language processing|sentiment analysis|python | 3 |
| P1 | P3 | python | 1 |
| P1 | P4 | r | 1 |
| P2 | P3 | python | 1 |
Appendix A.2. Prompt 2–Communication Network (CN)
- Prompt 2: System and User Message
- # ===== SYSTEM MESSAGE =====You are a precise research assistant. Your task is toprocess open-ended survey answers row by row and builda directed, weighted Communication Network (CN) basedon reported contacts.
- GENERAL PRINCIPLES
- -
- Input data contains free-text answers where eachparticipant names up to 5 persons they communicatewith inside their organization.
- -
- The order of names matters: the first person named= highest communication frequency.
- -
- Assign edge weights strictly according to order:1st person = 5, 2nd = 4, 3rd = 3, 4th = 2, 5th = 1.
- -
- If fewer than 5 names are listed, weights are stillassigned in descending order starting from 5.
- -
- Graph is directed: from respondent_id (source)to named_person_id (target).
- -
- Each pair (respondent, named_person) corresponds toone directed edge with its weight.
- -
- Names may appear in different forms (typos, nicknames,case variations). Attempt to canonicalize to a consistentidentifier if possible, and flag uncertain matches.
- -
- Output must be strictly machine-readable CSV.
- INPUT FORMAT (row per participant)
- {
- “id”: “P1”,
- “q_contacts”: “Alice, Bob, Charlie, David, Emma”
- }
- ROW-WISE PROCESSING
- (1)
- Split the response into a list of up to 5 names, ingiven order.
- (2)
- Normalize names (remove whitespace, standardize case,resolve obvious typos).
- (3)
- Assign integer weights: [5,4,3,2,1] mapped onto thelisted names.
- (4)
- For each named person, emit a directed edge:respondent_id -> named_person, weight.
- (5)
- Preserve original spelling of names as metadata.
- (6)
- Emit QC flags if:
- -
- more than 5 names given (truncate after 5, flag “overflow”),
- -
- unclear name or noise detected (flag “ambiguous_name”).
- OUTPUT ARTIFACT
- (1)
- comm_edges.csvColumns:source, target, weight, original_name, qc_flags
- ORDERING & FORMAT RULES
- -
- Sort edges by source ascending, then weight descending.
- -
- Do not output explanations, markdown, or prose. CSV only.
- # ===== USER MESSAGE =====You will receive the survey as JSON Lines under key “rows”.Please perform all steps and return the comm_edges.csv fileexactly as specified.
- INPUT EXAMPLE:
- {
- “rows”: [
- {“id”:“P1”,“q_contacts”:“Alice, Bob, Charlie, David, Emma”},
- {“id”:“P2”,“q_contacts”:“John, Alice, Peter”}
- ]
- }
- Prompt Explanation The presented prompt operationalizes the construction of a directed and weighted communication network from free-text survey responses. Each participant lists up to five individuals within their organization with whom they communicate most frequently. The order of listed names determines the edge weights, ranging from 5 (most frequent communication) to 1 (least frequent), while fewer than five responses are scaled accordingly. Directed edges are generated from the respondent to each named person, with additional quality-control flags used to identify ambiguous or noisy entries (e.g., typographical errors, unclear names, or more than five responses). The resulting edge list (comm_edges.csv) provides a standardized machine-readable representation of interpersonal communication patterns, suitable for subsequent social network analysis.
- Example for 5 participants
- Input data (survey responses)
- {
- “rows”: [
- {
- “id”: “P1”,
- “q_contacts”: “Alice, Bob, Charlie, Diana, Emma”
- },
- {
- “id”: “P2”,
- “q_contacts”: “John, Alice, Peter”
- },
- {
- “id”: “P3”,
- “q_contacts”: “Charlie, Alice, Bob, Frank”
- },
- {
- “id”: “P4”,
- “q_contacts”: “Emma, Alice, Bob, Charlie, John”
- },
- {
- “id”: “P5”,
- “q_contacts”: “Anna, Robert”
- }
- ]
- }
| Source | Target | Weight |
|---|---|---|
| P1 | Alice | 5 |
| P1 | Bob | 4 |
| P1 | Charlie | 3 |
| P1 | Diana | 2 |
| P1 | Emma | 1 |
| P2 | John | 5 |
| P2 | Alice | 4 |
| P2 | Peter | 3 |
| P3 | Charlie | 5 |
| P3 | Alice | 4 |
| P3 | Bob | 3 |
| P3 | Frank | 2 |
| P4 | Emma | 5 |
| P4 | Alice | 4 |
| P4 | Bob | 3 |
| P4 | Charlie | 2 |
| P4 | John | 1 |
| P5 | Anna | 5 |
| P5 | Robert | 4 |
Appendix B. Survey Questions
- What is the primary area of your research work? Please indicate if your research involves analyzing data from specific domains (e.g., education, biotechnology, medicine). We kindly ask you to also describe the main objectives of such analyses.
- Which scientific/professional methods (e.g., machine learning methods, social network analysis methods) do you use in your work? Please list all methods in detail and rank them according to frequency of use, from the most frequent to the least frequent.
- Which tools/systems/programming languages do you use in your work? Please provide a detailed list and rank them according to frequency of use, from the most frequent to the least frequent.
- On which topics or projects would you like to work in the future? These can be topics that differ from your current area of research or that could complement it.
- Who are up to five people within your organization with whom you communicate most frequently, whether professionally or informally? Please list them in order of communication frequency, starting with the person you interact with the most and ending with the one with whom you have less frequent contact.
References
- Tabassum, S.; Pereira, F.S.; Fernandes, S.; Gama, J. Social Network Analysis: An Overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1256. [Google Scholar] [CrossRef]
- Pitoski, D.; Meštrović, A.; Schmeets, H. The Complex Network Patterns of Human Migration at Different Geographical Scales: Network Science Meets Regression Analysis. Appl. Netw. Sci. 2024, 9, 35. [Google Scholar] [CrossRef]
- Pascual-Ferrá, P.; Alperstein, N.; Barnett, D.J. Social Network Analysis of COVID-19 Public Discourse on Twitter: Implications for Risk Communication. Disaster Med. Public Health Prep. 2022, 16, 561–569. [Google Scholar] [CrossRef] [PubMed]
- Clark, R.; Kentor, J. Foreign Capital and Economic Growth: A Social Network Analysis, 2001–2017. Sociol. Perspect. 2022, 65, 580–607. [Google Scholar] [CrossRef]
- Sahoo, S.R.; Gupta, B.B. Multiple Features Based Approach for Automatic Fake News Detection on Social Networks Using Deep Learning. Appl. Soft Comput. 2021, 100, 106983. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
- Schneider, J.; Meske, C.; Kuss, P. Foundation Models: A New Paradigm for Artificial Intelligence. Bus. Inf. Syst. Eng. 2024, 66, 221–231. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Wang, S.; Huang, J.; Chen, Z.; Song, Y.; Tang, W.; Mao, H.; Fan, W.; Liu, H.; Liu, X.; Yin, D.; et al. Graph Machine Learning in the Era of Large Language Models (LLMs). ACM Trans. Intell. Syst. Technol. 2024, 16, 1–40. [Google Scholar] [CrossRef]
- Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
- Nadeau, D.; Sekine, S. A Survey of Named Entity Recognition and Classification. Lingvist. Investig. 2007, 30, 3–26. [Google Scholar] [CrossRef]
- Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
- Chang, S.; Chaszczewicz, A.; Wang, E.; Josifovska, M.; Pierson, E.; Leskovec, J. LLMs Generate Structurally Realistic Social Networks but Overestimate Political Homophily. In Proceedings of the International AAAI Conference on Web and Social Media, Copenhagen, Denmark, 23–26 June 2025; Volume 19, pp. 341–371. [Google Scholar]
- Mellon, J.; Bailey, J.; Scott, R.; Breckwoldt, J.; Miori, M.; Schmedeman, P. Do AIs Know What the Most Important Issue Is? Using Language Models to Code Open-Text Social Survey Responses at Scale. Res. Polit. 2024, 11, 20531680241231468. [Google Scholar] [CrossRef]
- Meštrović, A.; Beliga, S.; Pitoski, D. Peoplet: Exploring Organizational Structures through Social Network Analysis. In Proceedings of the 48th ICT and Electronics Convention MIPRO, Opatija, Croatia, 2–6 June 2025; pp. 253–258. [Google Scholar]
- Deleris, L.; Bonin, F.; Daly, E.; Deparis, S.; Hou, Y.; Jochim, C.; Lassoued, Y.; Levacher, K. Know Who Your Friends Are: Understanding Social Connections from Unstructured Text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, New Orleans, LA, USA, 1–6 June 2018; pp. 76–80. [Google Scholar]
- Yose, J.; Kenna, R.; MacCarron, M.; MacCarron, P. Network Analysis of the Viking Age in Ireland as Portrayed in Cogadh Gaedhel re Gallaibh. R. Soc. Open Sci. 2018, 5, 171024. [Google Scholar] [CrossRef] [PubMed]
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. arXiv 2022, arXiv:2206.07682. [Google Scholar] [CrossRef]
- Dehal, R.S.; Sharma, M.; Rajabi, E. Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. Mach. Learn. Knowl. Extr. 2025, 7, 38. [Google Scholar] [CrossRef]
- Ananya, A.; Tiwari, S.; Mihindukulasooriya, N.; Soru, T.; Xu, Z.; Moussallem, D. Towards Harnessing Large Language Models as Autonomous Agents for Semantic Triple Extraction from Unstructured Text. In Proceedings of the Extended Semantic Web Conference, Hersonissos, Greece, 26–30 May 2024. [Google Scholar]
- Malaterre, C.; Lareau, F. Inferring Social Networks from Unstructured Text Data: A Proof of Concept Detection of Hidden Communities of Interest. Data Policy 2024, 6, e5. [Google Scholar] [CrossRef]
- Ren, X.; Tang, J.; Yin, D.; Chawla, N.; Huang, C. A Survey of Large Language Models for Graphs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6616–6626. [Google Scholar]
- Kivelä, M.; Arenas, A.; Barthelemy, M.; Gleeson, J.P.; Moreno, Y.; Porter, M.A. Multilayer Networks. J. Complex Netw. 2014, 2, 203–271. [Google Scholar] [CrossRef]
- Boccaletti, S.; Bianconi, G.; Criado, R.; Del Genio, C.I.; Gómez-Gardenes, J.; Romance, M.; Sendina-Nadal, I.; Wang, Z.; Zanin, M. The Structure and Dynamics of Multilayer Networks. Phys. Rep. 2014, 544, 1–122. [Google Scholar] [CrossRef]
- Bianconi, G. Multilayer Networks: Structure and Function; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
- Aleta, A.; Moreno, Y. Multilayer Networks in a Nutshell. Annu. Rev. Condens. Matter Phys. 2019, 10, 45–62. [Google Scholar] [CrossRef]
- De Domenico, M. More Is Different in Real-World Multilayer Networks. Nat. Phys. 2023, 19, 1247–1262. [Google Scholar] [CrossRef]
- Meštrović, A.; Petrović, M.; Beliga, S. Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features. Appl. Sci. 2022, 12, 11216. [Google Scholar] [CrossRef]
- Hammoud, Z.; Kramer, F. Multilayer Networks: Aspects, Implementations, and Application in Biomedicine. Big Data Anal. 2020, 5, 2. [Google Scholar] [CrossRef]
- Lv, Y.; Huang, S.; Zhang, T.; Gao, B. Application of Multilayer Network Models in Bioinformatics. Front. Genet. 2021, 12, 664860. [Google Scholar] [CrossRef]
- OpenAI. GPT-4.5 API Documentation. Available online: https://platform.openai.com/docs (accessed on 4 March 2025).
- Newman, M.E.J. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Clauset, A.; Newman, M.E.J.; Moore, C. Finding Community Structure in Very Large Networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef]
- Newman, M.E.J. Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 2004, 69, 066133. [Google Scholar] [CrossRef] [PubMed]
- Fortunato, S. Community Detection in Graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
- Rand, W.M. Objective Criteria for the Evaluation of Clustering Methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
- Hubert, L.; Arabie, P. Comparing Partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- Adamic, L.A.; Adar, E. Friends and Neighbors on the Web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Krackhardt, D. QAP Partialling as a Test of Spuriousness. Soc. Netw. 1987, 9, 171–186. [Google Scholar] [CrossRef]
- Dekker, D.; Krackhardt, D.; Snijders, T.A.B. Sensitivity of MRQAP Tests to Collinearity and Autocorrelation Conditions. Psychometrika 2007, 72, 563–581. [Google Scholar] [CrossRef] [PubMed]


| Phase | Step | Description |
|---|---|---|
| 1 | 1.1. Data Collection | Collect open-ended survey responses covering research interests, tools, collaboration, and communication. |
| 1.2. Data Preprocessing | Clean and standardize textual responses; organize data into a structured tabular format for processing. | |
| 2 | 2.1. Prompt Design | Design prompts tailored to extract relevant information for each network layer (e.g., similarity, communication). |
| 2.2. Prompt Refinement | Test and iteratively adjust prompts to improve accuracy, handle variation in language, and reduce ambiguity. | |
| 3 | 3.1. Relation Extraction | Use LLMs to extract entity pairs and relational data (e.g., shared keywords, named individuals). |
| 3.2. Edge List Generation | Translate LLM outputs into structured edge lists for each type of relation, including direction and weight. | |
| 3.3. Layer Construction | Construct individual network layers from edge lists, each representing a specific social or organizational dimension. | |
| 4 | 4.1. Entity Linking | Identify and align identical nodes (individuals or units) across layers. |
| 4.2. Cross-Layer Mapping | Create interlayer links between individuals, organizational units, and groups based on shared membership. | |
| 4.3. Multilayer Network Integration | Integrate all layers and interlayer links into a coherent multilayer network structure. | |
| 5 | 5.1. Global-Level Analysis | Compute metrics such as density, diameter, path length, and clustering for each layer. |
| 5.2. Meso-Level Analysis | Detect communities using algorithms like Louvain and evaluate modularity. | |
| 5.3. Local-Level Analysis | Identify key actors using centrality measures (degree, betweenness, closeness). | |
| 5.4. Cross-Layer Interpretation | Compare structures across layers to identify consistent or divergent patterns. | |
| 6 | 6.1. Automated Evaluation | When annotated ground-truth data are available, evaluate extraction accuracy using standard metrics such as precision, recall, and F1-score. |
| 6.2. Manual Review | Select a sample of LLM-extracted links for manual inspection to identify errors such as inconsistent name resolution or false ties. | |
| 6.3. Feedback Loop | Use insights from automated and/or manual evaluation to refine prompts, preprocessing, or data formatting. | |
| 6.4. Final Validation | Ensure consistency, reliability, and analytical usefulness of the multilayer network before interpretation or application. |
| Level | Measure/Technique | Description |
|---|---|---|
| Global | Average degree | Mean number of connections per node in a layer. (our dataset : ) |
| Average weighted degree | Mean interaction intensity per node (node strength), accounting for edge weights. (our dataset : ) | |
| Network diameter | Maximum shortest path length between any two nodes (on the largest weakly connected component for directed layers). (our dataset : ) | |
| Average path length | Mean shortest path between pairs of nodes (on ). (our dataset : ) | |
| Network density | Proportion of actual connections to all possible connections. Directed: . (our dataset : ) | |
| Clustering coefficient | Tendency of nodes to form tightly connected groups (average weighted local clustering on the undirected projection). (our dataset : ) | |
| Assortativity | Similarity-based connectivity; degree–degree Pearson correlation on the undirected projection. (our dataset : ) | |
| Reciprocity | Extent to which directed ties are mutual. (our dataset : ) | |
| Meso | Community detection | Identification of clusters of densely connected nodes (greedy modularity on undirected, weighted projection). (our dataset : communities) |
| Modularity | Evaluation of community structure (weighted Newman–Girvan); values above indicate significant modularity. (our dataset : ) | |
| Local | Degree centrality | Number of direct ties of a node. (our dataset : strongest in-degree = PROF1 = 9; strongest out-degree = AS1 = 5) |
| In-degree centrality | Number of times an actor is mentioned by others (important in survey-based layers). (our dataset : PROF1 = 9) | |
| Betweenness centrality | Extent to which a node lies on the shortest paths between others. (our dataset : AS1 = 0.1983) | |
| Closeness centrality | How close a node is to all others in terms of path length. (our dataset : PROF6 = 0.5087) | |
| Cross-layer | Cross-layer interpretation | Analysis of structural alignment and divergence across layers; includes comparison of node positions, identification of actors bridging multiple relationship types, and analysis of interlayer links (e.g., individuals to departments). |
| Prompt Example | Component | Content |
|---|---|---|
| 1. Keyword-Based Similarity Extraction | Example Description | Extract keywords from research profiles, compare pairs of researchers, and count shared keywords to generate weighted undirected edges. |
| Input | Researcher: PROF1 Profile: Machine learning, natural language processing, Python, large language models Researcher: PROF2 Profile: Data science, neural networks, NLP, Python, generative models | |
| Output | PROF1–PROF2: 2 shared keywords (NLP, Python) Structured as: source, target, weight | |
| 2. Named Entity Standardization and Edge Weighting | Example Description | Extract and standardize names mentioned in communication-related survey responses; assign edge weights based on order of mention. |
| Input | Researcher: AS3 Mentions: “I mostly talk with prof. Ivan, then sometimes with Kristina and Dragan.” | |
| Output | AS3–PROF6: 5 AS3–AS4: 4 AS3–AS2: 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Meštrović, A.; Beliga, S.; Pitoski, D. LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Appl. Sci. 2026, 16, 163. https://doi.org/10.3390/app16010163
Meštrović A, Beliga S, Pitoski D. LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Applied Sciences. 2026; 16(1):163. https://doi.org/10.3390/app16010163
Chicago/Turabian StyleMeštrović, Ana, Slobodan Beliga, and Dino Pitoski. 2026. "LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response" Applied Sciences 16, no. 1: 163. https://doi.org/10.3390/app16010163
APA StyleMeštrović, A., Beliga, S., & Pitoski, D. (2026). LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Applied Sciences, 16(1), 163. https://doi.org/10.3390/app16010163

