Next Article in Journal
High-Accuracy Modeling and Mechanism Analysis of Temperature Field in Ballastless Track Under Multi-Boundary Conditions
Previous Article in Journal
Unraveling the Hf4+ Site Occupation Transition in Dy: LiNbO3: A Combined Experimental and Theoretical Study on the Concentration Threshold Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response †

1
Faculty for Informatics and Digital Technologies, University of Rijeka, 51000 Rijeka, Croatia
2
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, 51000 Rijeka, Croatia
3
Peoplet Ltd., 52000 Lindar, Croatia
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Meštrović, A.; Beliga, S.; Pitoski, D. Peoplet: Exploring Organizational Structures through Social Network Analysis. In Proceedings of the 48th ICT and Electronics Convention MIPRO, Opatija, Croatia, 2–6 June 2025.
Appl. Sci. 2026, 16(1), 163; https://doi.org/10.3390/app16010163
Submission received: 4 August 2025 / Revised: 7 December 2025 / Accepted: 16 December 2025 / Published: 23 December 2025
(This article belongs to the Special Issue Research Progress in Complex Networks and Graph Data Analysis)

Abstract

This paper explores the emerging potential of large language models (LLMs) and generative AI for social network analysis (SNA) based on open-ended survey data as a source. We introduce a novel methodology, Survey-to-Multilayer Network (SURVEY2MLN), which systematically transforms qualitative survey responses into structured multilayer social networks. The proposed approach integrates prompt engineering with LLM-based text interpretation to extract entities and infer relationships, formalizing them as distinct network layers representing research similarity, communication, and organizational affiliation. The SURVEY2MLN methodology is defined through six phases, including data preprocessing, prompt-based extraction, network construction, integration, analysis, and validation. We demonstrate its application through a real-world case study within an academic department, where prompt engineering was used to extract and model relational data from narrative responses. The resulting multilayer network reveals both explicit and latent social structures that are not accessible through conventional survey techniques. Our results show that LLMs can serve as effective tools for deriving sociograms from free-form text and highlight the potential of AI-driven methods to advance SNA into new, text-rich domains of inquiry.

1. Introduction

Social network analysis (SNA) is the process of investigating social structures through networks and graph theory [1]. In an SNA representation, individuals (or other actors) are depicted as nodes, and their relationships (friendships, communications, collaborations, etc.) are represented as links (also referred to as edges or ties). By mapping out who is connected to whom, SNA reveals patterns that help explain information flow, community formation, influence, and other social phenomena. SNA has become a widely used analytical framework across disciplines, from sociology [2] and medicine [3] to economics [4] and information science [5], reflecting the importance of relational data.
The emergence of foundation models and generative AI has initiated a paradigm shift in information science [6]. A foundation model refers to a large-scale deep learning model trained on broad data and adaptable to many tasks [7]. These models, especially large language models (LLMs) like BERT [8], GPT-3 [9] or GPT-4.5, exhibit excellent capabilities in natural language understanding and generation. Such generative AI systems can fluently interpret and produce human-like text, enabling applications far beyond traditional NLP. Researchers are now able to apply LLMs in diverse domains, including graph analysis and network science [10].
At the same time, social networks derived from survey responses are often complex, as they capture multiple types of relationships such as collaboration, communication, and organizational affiliation. Representing these diverse relational dimensions within a single-layer network risks oversimplification. Instead, multilayer network analysis offers a more expressive framework by modeling different types of ties as distinct but interconnected layers, allowing researchers to explore how formal structures and informal interactions intersect and shape social dynamics.
This raises the question: Is the integration of LLM-based extraction and multilayer network analysis a feasible approach for deriving and studying social networks from survey data?
Although survey data are commonly used in social science research, the potential of open-ended survey responses as a data source for social network analysis remains relatively underexplored. Such responses often contain valuable but implicit relational information. For example, when participants describe their work habits, collaborations, or communication patterns, they may reference other individuals in ways that reflect underlying social connections (“…I often consult Alice and Bob about project issues…”). These narrative elements provide insights into informal interactions, influence, and shared expertise that are highly relevant for SNA. However, because the information is embedded in unstructured text rather than predefined formats, extracting and formalizing social ties from open-ended responses presents a methodological challenge and has historically required manual interpretation.
Recent advances in LLMs suggest a compelling opportunity to bridge this gap. Generative AI models can serve as powerful “data translators,” converting unstructured text into structured network data [11]. Historically, extracting social networks from text required complex natural language processing pipelines. For example, prior approaches used named-entity recognition to find actors and co-occurrence or syntactic rules to infer links between them [12,13]. Such methods demanded significant expertise and often struggled with nuances (e.g., alias names or context-specific relations). In contrast, modern LLMs come with a rich, built-in understanding of language and world knowledge. They can perform zero-shot inference, recognizing entities and relationships without task-specific training [14]. Indeed, an LLM can read a collection of open-ended responses and directly identify who interacts with whom, even if the text is complex. This capability has been demonstrated in related tasks: for instance, ref. [15] showed that a state-of-the-art LLM (Anthropic’s Claude) could categorize open-text survey answers with near-human accuracy. The success of LLMs in extracting structured insights (like topics or categories) from qualitative data hints at their potential for extracting networks as well. Moreover, researchers have begun to use LLMs to generate or simulate social networks. Chang et al. [14] prompted GPT-4 to create synthetic social tie structures and found the AI could produce realistic network patterns (matching real networks on properties like clustering and degree distribution) when guided appropriately. These early efforts illustrate that LLMs not only understand language, but can also encode and reproduce relational structures described in language.
Based on that, this paper explores how LLMs and generative AI can enable a new paradigm for social network analysis. Specifically, we propose a novel methodology, Survey-to-Multilayer Network (SURVEY2MLN), for extracting multilayer networks from open-ended survey data using LLMs. The proposed approach provides a structured framework for extracting, transforming, and analyzing relational data that is embedded in survey answers. The methodology systematically integrates prompt engineering with LLM-based text interpretation to identify relevant entities and social ties, and to translate them into formal network structures. As the underlying structural model, we propose the use of a multilayer network, which enables the formal representation of multiple types of relationships between individuals within a unified analytical framework. Next the methodology proposes several phases, including data preprocessing, prompt-based extraction of intra- and interlayer relationships, construction of individual network layers, and integration into a coherent multilayer graph.
Subsequently, we present a case study in which open-ended survey feedback collected from a university department (the University of Rijeka, Faculty of Informatics and Digital Technologies) was analyzed with GPT-4.5 to extract perceived collaboration ties among individuals. This proof-of-concept demonstrates that generative AI can uncover meaningful relationship patterns from natural language input, effectively automating a process that would otherwise require manual qualitative coding or direct relational questions in a survey. The resulting AI-generated network can be analyzed with standard SNA techniques (such as community detection or centrality measures) just like a conventional network, but it is built from rich narrative context rather than simple name lists. This work extends our previous research with LLM-based network construction by advancing a more systematic and generalizable methodology for extracting multilayer social networks from unstructured textual data [16].
Building on these foundations, the main contribution of this study lies in the design and formalization of the SURVEY2MLN methodology as a comprehensive framework for translating unstructured survey responses into multilayer social networks using LLMs. The following sections provide a detailed explanation of the methodology itself, as well as its practical implementation and evaluation through an empirical case study.

2. Background

2.1. Uncovering Social Ties via LLMs

Researchers have long recognized the potential of unstructured text as a source of relational insight. Earlier efforts in the 2000s and 2010s leveraged natural language processing techniques to extract social connections from documents ranging from news articles and dialogue transcripts to literary fiction [17]. For instance, Deleris et al. (2018) [17] demonstrated a system that ingests free-text narratives (e.g., case notes, speech transcripts) and automatically constructs a multigraph representation of the interpersonal relationships described. Even historical narratives have been mined to reveal hidden social ties; Yose et al. [18] analyzed a medieval text to uncover alliances and hostilities among actors during the 1014 Battle of Clontarf. These earlier approaches typically relied on bespoke information extraction pipelines or statistical text-mining methods (such as topic modeling) to derive networks from unstructured data. They showed that meaningful social networks can be inferred from text, but often with considerable effort in developing domain-specific rules or models.
The landscape of text analysis has shifted dramatically with the advent of LLMs and generative AI. Modern LLMs like GPT-3 and GPT-4 exhibit emergent abilities—unexpected competencies that arise only at sufficient model scale—which vastly expand the range of tasks they can perform [19]. Built on transformer architectures and trained on massive corpora, these models achieve near-human prowess in language understanding and generation. Crucially, LLMs can often be applied to new tasks with minimal task-specific training, simply by prompting them in plain language. This general-purpose capacity has positioned LLMs (e.g., GPT-4.5) as powerful tools across numerous applications, from coding assistance to scientific discovery. It also suggests immense potential for SNA: an LLM can comprehend free-form text and identify the relational information embedded within it, performing a kind of automated qualitative analysis. In theory, an advanced language model might read open-ended survey responses or other narratives and determine “who knows whom,” “who collaborates with whom,” or other social ties described in text—tasks that earlier algorithms struggled to accomplish without extensive pre-programmed rules.
Recent research has begun harnessing LLMs to automate knowledge graph construction, leveraging their language understanding to perform tasks like entity recognition and relation extraction that transform free text into graph-ready data [20]. For example, an LLM-based system can read a passage and identify the key entities (people, organizations, etc.) and the relationships between them, outputting semantic triples suitable for a knowledge graph [21]. Translating this capability to SNA implies that a sufficiently powerful LLM could serve as an “automated sociogram builder”—scanning narrative responses and outputting a network of who is connected to whom (and possibly the nature of those connections). In particular, this approach of using cutting-edge LLMs for direct sociogram construction is largely unprecedented; to date, very few studies have used state-of-the-art language models to map social relationships from text. Previous text-to-network analyses tended to use more conventional NLP methods (for instance, fitting topic models to actor-authored texts to infer communities of interest [22], or rule-based relationship extraction [17] rather than generative AI.
The integration of LLMs introduces a new paradigm whereby qualitative, free-form descriptions can be converted into quantitative network data with minimal human coding, potentially overcoming limitations of traditional survey-based SNA and opening entirely new avenues for data collection. By bridging advanced language understanding with graph analysis, researchers can begin to map complex social relationships that were once hidden in unstructured text, paving the way for novel investigations into communities, organizational dynamics, and other complex networks [23].
Although a few recent studies have begun to explore the potential of LLMs in the context of SNA, this remains a novel and fast-emerging research area. Most existing work focuses on proof-of-concept applications or domain-specific experiments, while a comprehensive understanding of how LLMs can be systematically integrated into SNA workflows is still lacking. The ability of these models to extract complex relational information from unstructured text opens up significant opportunities for methodological innovation, particularly in transforming qualitative data sources into analyzable network structures. As the capabilities of generative AI continue to evolve, there is considerable room for future research to refine, evaluate, and expand the use of LLMs in mapping social dynamics across diverse contexts.

2.2. Multilayer Network

Social relationships captured through survey responses often exhibit high complexity, encompassing diverse forms of connections such as collaboration, shared interests, knowledge exchange, and complementary skills. These relationships are rarely uniform and can span different social contexts and functional roles. Moreover, to adequately represent real-world social systems, it is often necessary to account for additional structural dimensions, such as affiliation with specific groups, departments, or informal communities. Given this complexity, a multilayer social network model provides a more expressive and nuanced framework for analysis. By representing different types of relationships as distinct layers within a unified network structure, researchers can explore how various social dimensions intersect, reinforce, or contradict one another, offering deeper insights into patterns of interaction, influence, and cohesion within the studied population.
Various definitions of multilayer networks have been proposed in the literature [24,25,26], often depending on the context and the specific type of data being modeled. While some definitions assume a common set of nodes across layers, others allow for more flexible configurations with different node sets and inter-layer links. In recent years, the structure of multilayer networks has been further formalized and generalized to better capture complex, real-world phenomena, and has been applied across a wide range of domains including social systems, transportation, biology, and information science [27,28,29,30,31].
In this paper, we adopt the definition of multilayer networks as presented in [25], which provides a clear and flexible framework for modeling multiple types of social relations across distinct but interconnected layers. According to this definition a multilayer network is defined as a pair:
M = ( G ,   C )
where
G = { G α ,   α     { 1 ,   ,   m } }
is a family of networks (graphs) G α = ( V α ,   E α ) called network layers of M and C   =   E α β     V α   ×   V β ; α ,   β     { 1 ,   , m } ,   α     β is the set of interconnections between nodes of different layers G α and G β where α     β .
As illustrated in Figure 1, the schematic example shows an abstract multilayer network with four layers. The nodes are connected by two types of links: solid lines denote intralayer connections within the same layer, while dashed lines represent interlayer connections across different layers. This abstract illustration serves to demonstrate the general principle of multilayer network modeling without reference to any specific application domain.
To further exemplify the flexibility of the multilayer network formalism, consider a company where survey responses capture different aspects of relationships among employees. In this case, a multilayer network ML can be constructed with three layers representing interpersonal ties and one structural layer representing organizational units.
Formally, we define:
G = { G 1 ,   G 2 ,   G 3 ,   G 4 }
where
G 1 = ( V 1 ,   E 1 ) Project Collaboration Layer ( employees collaborating on projects )
G 2 = ( V 2 ,   E 2 ) Shared Skills Layer ( employees with overlapping skill sets )
G 3 = ( V 3 ,   E 3 ) Friendship Layer ( self - reported social ties among employees )
G 4 = ( V 4 ,   E 4 ) Department Layer ( organizational units within the company )
In the first three layers ( G 1 ,   G 2 ,   G 3 ), the nodes represent employees and the edges are derived from survey responses about collaboration, shared skills, or friendship. The fourth layer ( G 4 ) contains nodes that correspond to departments, where edges between departments represent business communication established through organizational processes.
Interlayer connections C link each employee across the first three layers, as well as between employees and their corresponding department in G 4 . Additionally, departments can be connected to one another through inter-departmental business processes, reflecting the interplay between formal organizational structure and informal or skill-based employee relations. This example illustrates how multilayer networks can model both interpersonal dynamics and structural organization within a company.
The analysis of multilayer social networks involves examining both intralayer (within-layer) and interlayer (between-layer) connectivity, allowing for a more comprehensive understanding of the complexity and dynamics of social systems. In the initial phase of analysis, standard metrics from traditional social network analysis are applied across layers. Global network measures offer insight into the overall structure and connectivity of the system under various conditions; community detection algorithms help identify groups of individuals who interact closely within or across specific contexts; and centrality measures highlight key actors who hold influential positions within individual layers or serve as bridges across multiple types of relationships.
Subsequent stages of multilayer analysis involve a more granular examination of individual layers and their interrelations. Intralayer analysis enables the study of community structure within specific types of relations (e.g., collaboration, communication, shared interests), revealing how cohesive or fragmented different social dimensions are. Moreover, the comparison of structures across layers can uncover inconsistencies or mismatches between different types of interactions—such as situations where formal groupings do not align with informal patterns of influence or cooperation. These insights can inform interpretations of social cohesion, role differentiation, and potential interventions aimed at improving coordination, inclusion, or information flow within the studied network.

3. Materials and Methods

3.1. Proposed Methodology

In this section we describe the novel methodology SURVEY2MLN, which extract structured social network representations from unstructured, open-ended survey responses using LLMs and prompt engineering. The proposed approach integrates techniques from prompt engineering and LLMs with multilayer network modeling to construct a coherent and analyzable social graph. Below, we outline the key steps of the methodology, which are applicable to a variety of social contexts involving free-form textual data.
The first phase involves data collection and preprocessing. Respondents are asked to provide open-ended answers regarding their research interests, expertise, methodologies, tools, and collaboration patterns. These responses are organized into a structured dataset, where each row corresponds to an individual participant, and each column contains text relevant to a specific dimension of interest (e.g., research profile, communication partners). Basic cleaning is applied to remove extraneous characters, standardize formatting, and ensure consistent encoding.
In the second phase, prompt engineering is applied to guide the LLM in extracting relational data from the unstructured input. For each network layer to be constructed, a specific prompt is designed to instruct the model to (i) identify relevant entities (such as individuals or concepts), (ii) detect relationships or overlaps, and (iii) output results in a machine-readable format (e.g., edge list or adjacency matrix). For instance, in the research similarity layer, prompts are used to extract keywords and calculate intersections between researchers’ profiles. In the communication layer, prompts extract names of frequently mentioned colleagues and assign ranked weights to the connections. Each prompt is tested and refined iteratively to ensure consistency, minimize ambiguity, and handle linguistic variation.
The third phase entails network layer construction. Based on the structured output provided by the LLM, a series of edge lists is generated—each representing a distinct type of social or organizational relationship. These edge lists are used to construct individual network layers, which may differ in terms of directionality (directed or undirected), weighting, and node composition. Layers can represent similarity-based ties, direct communication, group co-membership, or other dimensions derived from the input data.
In the fourth phase, interlayer connections are defined and integrated. These include links between identical entities across different layers (e.g., the same individual appearing in both the research and communication networks), connections between individuals and organizational units (e.g., labs or chairs), and relationships between groups (e.g., overlapping membership between labs and chairs). This results in a coherent multilayer network structure that captures both intra- and interlayer dynamics.
The fifth phase involves the analysis of the constructed multilayer network using standard techniques from social network analysis. This includes the application of global, meso-, and local-level metrics such as degree centrality, clustering, community detection, and structural cohesion, both within and across layers. By examining each layer individually and in relation to others, researchers can uncover structural patterns, key actors, and emergent communities that would otherwise remain hidden in unstructured text.
The final phase of the methodology focuses on validation and refinement of the extraction process. Ideally, this step should be conducted using annotated ground-truth datasets, which enable systematic evaluation through standard metrics such as precision, recall, and F1-score. These measures provide a quantitative assessment of how accurately the LLM identifies entities and relationships, while also offering insight into potential biases or systematic errors in the extraction procedure. In situations where annotated datasets are not available, evaluation can be carried out through manual inspection of a selected subset of extracted relationships. This approach allows researchers to check the consistency of the outputs against the original survey responses and to identify common sources of error, such as inconsistent name resolution or false inference of ties. Findings from either automated or manual evaluation can then be used to refine prompt design and preprocessing steps, ensuring that the final network representation is both reliable and analytically meaningful.
To formalize the proposed SURVEY2MLN methodology, we define a series of sequential steps that guide the transformation of open-ended survey responses into a structured multilayer social network. These steps encompass data collection, prompt-driven information extraction using a large language model, and the construction and validation of multiple interconnected network layers. A concise overview of all methodological phases is presented in Table 1.
The advantage of the SURVEY2MLN methodology lies in its ability to transform rich, qualitative survey input into structured multilayer graphs without relying on rigid, rule-based NLP pipelines. By leveraging the flexibility and generalization capabilities of LLMs, the proposed approach enables scalable and context-sensitive extraction of relational data, applicable in a wide range of research, organizational, and social settings.

3.2. Multilayer Network Construction Using LLMs

The SURVEY2MLN methodology prescribes a systematic process for constructing multilayer social networks from open-ended survey responses using LLMs. This process begins with the collection of free-text answers designed to capture various dimensions of social interaction, including shared research interests, collaborative relationships, communication frequency, and organizational affiliations.
Once collected, the unstructured textual data are transformed into structured network components through prompt-based information extraction. Tailored prompts guide the LLM to identify relevant entities (e.g., individuals, research topics, tools) and infer relationships among them. These prompts are designed to handle specific network layers—for instance, to detect similarity-based ties or to extract direct references to communication partners. The LLM is tasked with generating machine-readable outputs in the form of edge lists, including metadata such as directionality and weights of connections.
For example, in constructing a Research Similarity Layer, respondents’ descriptions of their research domains and methodologies are analyzed to extract key terms. The LLM compares term sets across all participants, identifies overlapping elements, and generates weighted undirected edges representing the strength of conceptual similarity. In another layer focused on communication, textual mentions of colleagues are interpreted to identify directed ties, where the order or frequency of mention determines edge weight. In both cases, the use of LLMs supports the resolution of ambiguous references, such as nicknames or informal mentions, and enables consistent entity labeling.
The methodology emphasizes iterative prompt refinement and output standardization to ensure reliability and consistency. Prompts are tested and adjusted to handle language variability, capture synonyms (e.g., “NLP” and “natural language processing”), and avoid redundancy. The structured outputs are then used to build individual network layers, each capturing a distinct relational aspect.
Finally, interlayer connections are established by linking corresponding nodes across layers (e.g., the same person appearing in multiple contexts), as well as by defining relationships between individuals and organizational units. These multiple layers are integrated into a formal multilayer network structure, which supports rich, multidimensional analysis of social systems.
Formally, we can define this as follows. Let ML   =   ( G ,   C ) denote the multilayer network with layers G 1 (Research Layer), G 2 (Communication Layer), G 3 (Laboratory Layer; LBL), and G 4 (Chair Layer; CHL, if applicable). We instantiate C with:
(a)
Identity couplings ( G 1  ↔  G 2 ). For each person v that appears in both G 1 and G 2 , we add an interlayer edge ( v ( 1 ) ,   v ( 2 ) ) with weight w = 1 . No identity coupling is created into layers where the entity does not exist (i.e., we do not create “virtual twins”).
(b)
Membership couplings ( G 2  ↔  G 3 / G 4 ). For each person v affiliated with laboratory L and (if used) chair H, we add ( v ( 2 ) ,   L ( 3 ) ) and ( v ( 2 ) ,   H ( 4 ) ) with weight w = 1 . Multiple affiliations yield one coupling per affiliation.
(c)
Structural couplings ( G 3  ↔  G 4 ). When a laboratory L and a chair H share at least one member, we add ( L ( 3 ) ,   H ( 4 ) ) with weight equal to the number of shared members.
These rules imply that a node in G 1 is linked to its identity in G 2 only if the person appears in both layers; in contrast, links to G 3 arise through membership. Thus, a person can be connected across G 1 G 2 but have no coupling to G 3 if no laboratory membership is recorded. Avoiding virtual twins prevents spurious cross-layer paths that could bias the reported measures.
This prompt-based LLM approach offers a flexible and scalable alternative to rule-based NLP methods, enabling the extraction of meaningful network data from qualitative sources. It facilitates the conversion of complex narrative input into analyzable network formats and supports applications across diverse research domains.

3.3. Multilayer Network Analysis

The SURVEY2MLN methodology includes a structured analytical phase that prescribes the application of standard social network analysis techniques, adapted for multilayer network structures. The analytical process is organized into three levels—global, meso, and local—while also addressing cross-layer dynamics to ensure a comprehensive understanding of the network.
At the global level, the methodology involves the computation of established network metrics for each individual layer. These include average degree, average weighted degree, network diameter, average path length, network density, clustering coefficient, assortativity, and reciprocity. Such metrics serve to quantify overall cohesion, connectivity, and the structural properties of each network layer. For example, average degree and weighted degree reflect the general level and intensity of interaction, while network density measures the extent of interconnectedness. The clustering coefficient captures the tendency of nodes to form tightly interconnected groups, and path-based metrics reveal the efficiency of information flow. Assortativity measures similarity-based connectivity, and reciprocity assesses mutual recognition in directed relations.
On the meso-level, the methodology recommends community detection within each network layer to uncover clusters of densely connected nodes. This is typically achieved using modularity-based algorithms such as Louvain, which partitions the network into groups by maximizing modularity values. Values above 0.3 indicate a significant modular structure, revealing functional or interest-based communities such as collaboration clusters or topic-aligned groups.
The local-level analysis focuses on identifying key actors within each layer through centrality measures. The methodology includes the calculation of degree, betweenness, and closeness centrality, accounting for variations in network type (weighted/unweighted, directed/undirected). In layers with survey-imposed constraints—such as a fixed number of outgoing ties—the focus is placed on in-degree centrality to detect individuals who are frequently mentioned by others, indicating social or professional prominence.
Beyond intralayer properties, the methodology emphasizes the importance of analyzing interlayer connections. By comparing structures and positions across layers, it becomes possible to detect alignment or divergence between formal roles and informal relationships. This multilayer perspective enables the identification of actors who, for instance, belong to one organizational unit formally but are embedded in different informal communication or collaboration networks. The integrated multilayer analysis thus provides richer insights into the social fabric and structural alignment of complex systems such as academic institutions or professional organizations.
Table 2 systematizes and clearly presents the main measures that can be applied in the analysis of multilayer networks within the SURVEY2MLN methodology. The measures are grouped according to the level of analysis (global, meso, and local), with an additional category addressing cross-layer interpretation. This organization provides a concise overview of the analytical tools available for examining both structural properties of individual layers and the interactions between them.

4. Case Study

4.1. Survey Data Extraction

In this section we describe the dataset on which the proposed methodology was applied. The data were collected through an anonymous survey among researchers at the Faculty of Informatics and Digital Technologies (FIDIT), University of Rijeka. The survey instrument consisted of five open-ended questions designed to capture both explicit and implicit social relationships, as well as aspects of research similarity and organizational affiliation. Out of these, four questions (Q1–Q3 and Q5) were used directly in the multilayer network construction, while one question (Q4) focused on prospective research directions and was not included in the present analysis. The full wording of the survey questions is provided in Appendix B. The network dataset (anonymised) is publicly available at https://doi.org/10.6084/m9.figshare.29772305 (accessed on 1 August 2025). Using the weighted edge adjacency list from this dataset, we compute and report key network metrics in Table 2. For reproducibility and further exploration, we also provide a ready-to-run Gephi (Gephi Desktop v0.10.1) project with the dataset, enabling readers to compute additional metrics directly from the supplied file.
A total of 24 researchers, including professors and assistants, participated in the survey. Their responses were anonymized, and each participant was assigned a label indicating their position (e.g., PROF1–PROF16, AS1–AS8). In addition to survey respondents, several individuals who were named in answers to the communication-related question were also added to the dataset (e.g., PROF17, AO1–AO2) to preserve the integrity of reported ties. This ensured that the resulting network reflected not only direct participants but also actors considered important by them.
The constructed directed network indicates a moderately sparse structure with average in/out degree k in = k out = 4.44 and average weighted in/out degree (node strength) s in = s out = 13.33 , suggesting that ties, when present, often carry multiple or stronger interactions. The giant component exhibits short paths (diameter = 4 , l = 2.21 ), consistent with efficient reachability. Clustering is modest ( C ¯ = 0.204 ), implying some, but not pervasive, triadic closure. Degree assortativity is approximately neutral ( r = 0.018 ), indicating little preference for degree-similar attachment. Reciprocity is high ( ρ = 0.70 ), consistent with mutual acknowledgments in survey-based communication. Community structure is pronounced (modularity Q = 0.425 ) with | C | = 5 groups, supporting the presence of coherent meso-scale clusters that align with role or affiliation patterns discussed elsewhere in the manuscript.
Across the four analyzed questions, we collected up to 96 free-text responses (24 participants × 3 research-related questions) and 24 ordered contact lists (one per participant, with up to five names each). Response lengths varied, from short one-line answers (5–10 words) to more elaborate descriptions spanning several sentences. To provide greater transparency, we also measured the response lengths in both words and GPT-4-compatible tokens, reporting descriptive statistics in Section 4 and a per-question breakdown in Appendix B for transparency.
Based on the survey responses, a multilayer social network was constructed using prompt engineering and large language models. In this approach, open-ended textual answers were transformed into structured network data by designing prompts that guided the LLM to extract relevant entities and relationships. Specifically, in this study we employed the GPT-4.5 model accessed via the OpenAI API [32] to analyze similarities and overlaps in participants’ responses. To ensure objectivity and reproducibility, the temperature parameter was set to 0.0, while other parameters were kept at their default values ( t o p _ p = 1.0 , f r e q u e n c y _ p e n a l t y = 0.0 , p r e s e n c e _ p e n a l t y = 0.0 ). The m a x _ t o k e n s parameter was set to 2048, providing sufficient capacity for the generation of comprehensive comparisons. The analysis was conducted iteratively, with each pair of participants’ responses compared separately in a loop until all possible pairwise combinations had been exhausted. This systematic procedure allowed the consistent detection of similarities and differences across all participants.
For the construction of the Research Similarity Layer, researchers were asked to describe their research areas, methodologies, and tools (Q1–Q3). These answers were recorded in a structured dataset, with each row representing a researcher and their corresponding free-text research profile. To compare these profiles and identify similarities, we designed prompts instructing the LLM to extract canonicalized keywords from each profile and calculate the number of overlapping terms between each pair of researchers. Rather than manually coding textual responses, the model produced a pairwise comparison matrix that captured the number of shared research elements. The output of this process was a weighted undirected edge list, where each edge represents the number of shared research concepts between two researchers.
Examples of prompt-based extraction tasks used in the methodology are shown in Table 3.
The advantage of using an LLM in this context is its capacity to resolve implicit or vague references, as well as its contextual understanding of language variation. For instance, if a participant mentioned “I often consult with Kristina regarding research methodology,” the model was able to associate this statement with a specific individual in the dataset and generate the appropriate directed edge.
In both cases, the use of prompt-based LLM processing enabled the transformation of qualitative, narrative survey data into quantitative network representations. This method was especially valuable in bypassing the need for traditional rule-based natural language processing pipelines, which often require handcrafted tokenizers, keyword lists, and extensive preprocessing.
To validate the consistency of the extracted edges, a sample of outputs was manually reviewed by the authors. We compared the model-generated edge lists with manual interpretations of survey responses and found a high level of agreement in both keyword matching and named entity recognition. In future work, this manual check will be complemented by pending validation steps, including a human-coded ground-truth evaluation (precision/recall/F1 with inter-annotator agreement), comparison to traditional NLP baselines, and cross-LLM replication to quantify robustness of the extracted edge lists.
Overall, this approach demonstrates how prompt engineering combined with generative AI can serve as a powerful tool for creating structured network data from unstructured textual input. By tailoring the prompts to specific data extraction tasks, we were able to efficiently build multilayer representations that reflect both the explicit and latent social structures within the surveyed community.

4.2. Multilayer Network of University of Rijeka–FIDIT

In this section we describe the Multilayer Network of FIDIT constructed based on the previously described survey. Network consists of four main layers. The first layer, referred to as the Research Similarity Layer (RSL), captures relationships based on shared research areas, methods, and tools. This layer was constructed as a weighted and undirected network, where edges between researchers were created when overlap was detected in their research profiles. To extract and compare relevant information from the open-ended textual responses, we used a large language model (GPT 4.5) through prompt-based instructions that enabled transformation of text into structured edge lists. The edge weights reflect the number of overlapping elements in research areas, methods, or tools.
The second layer, the Communication Layer (CL), was constructed as a directed and weighted network based on survey responses in which researchers listed up to five colleagues with whom they communicate most frequently, either professionally or informally. The edges in this layer represent explicitly stated communication ties, and their weights are assigned according to the order in which colleagues were listed by respondents, reflecting the perceived strength or frequency of interaction.
In addition to these two layers derived from survey data, two structural layers were included to represent formal organizational affiliations. The third layer captures laboratory membership. It includes links between laboratories established through joint research projects or co-authored publications of their members over the past three years. The fourth layer represents chair affiliation, where professors are linked to one or more academic chairs. Connections between chairs are formed when they share one or more professors as members, with edge weights representing the number of overlapping individuals.
The multilayer network also includes interlayer connections. These include links between the same individuals across different layers, connections between individuals and their respective laboratories or chairs, and connections between laboratories and chairs that share affiliated members. This multilayer representation enables the exploration of both social and formal structural aspects of the institution and their mutual interactions.
The resulting multilayer network (MLN) constructed from the FIDIT survey data is visualized in Figure 2. This network illustrates the layered structure of different relationship types—including research similarity, communication, and organizational affiliation—represented as interconnected layers within a unified graph model.

5. Discussion

5.1. Feasibility and Validation of the SURVEY2MLN Methodology

The results of this study demonstrate the feasibility and potential of using LLMs in combination with prompt engineering for extracting structured relational data from open-ended survey responses. The proposed SURVEY2MLN methodology formalizes this process and enables the construction of multilayer social networks that reflect both formal and informal relationships within an organization. An initial evaluation of the methodology was carried out through the presented case study, where the extracted ties were manually reviewed against the original survey responses. Although limited in scope, this manual validation confirmed that the methodology can produce reliable and meaningful network structures, thereby providing support for its feasibility and acceptability as a proof-of-concept approach.
The constructed multilayer network for the FIDIT case study illustrates the applicability of this methodology in academic environments. The results show that LLMs can accurately extract both explicit and implicit ties, standardize naming inconsistencies, and generate high-quality network representations. Manual validation confirmed the reliability of the extracted relationships, while the applied metrics and community detection techniques provided meaningful insights into institutional structure and collaboration patterns. While a small-scale case study allowed for detailed control and contextual interpretation, future applications on larger datasets will be needed to confirm generalizability. Nevertheless, this demonstration shows how the methodology can be directly applied in organizational contexts to uncover hidden collaboration patterns, identify key actors, and provide actionable insights for team or department management.
Taken together, these case-derived insights motivate a complementary, quantitative check of structural validity. To mitigate the inherent limits of small-sample manual review and to provide reproducible evidence, we augment the analysis with annotation-free, algorithmic validation. This extension does not replace qualitative assessment; rather, it tests whether the observed patterns persist under formal nulls, perturbations, and simple predictive tasks.
Relying solely on manual inspection is vulnerable to well-known biases (confirmation, availability) and is impractical to scale; moreover, with a small sample the variance of human judgments can dominate signal. We therefore add annotation-free, algorithmic checks whose aim is not to assert perfection but to establish—using model-free statistics—whether the observed structure is (i) stronger than degree-driven chance, (ii) stable under reasonable perturbations, and (iii) predictive of unseen ties. Concretely, we use a degree-preserving permutation null (directed double-edge swaps; a configuration-model surrogate [33]) to test whether triadic closure (clustering), degree–degree correlation (assortativity), and mutuality (reciprocity) exceed what the in/out-degree sequence alone would induce. In our data, clustering = 0.204 , assortativity = 0.018 , and reciprocity = 0.70 all yield empirical p = 1.000 (500 permutations), implying that these values are not unusually high once degree is held fixed; practically, this means much of the observed transitivity, degree mixing, and mutuality can be attributed to the degree sequence rather than higher-order organization.
To probe meso-scale organization directly, we assess robustness by pruning low-weight ties (thresholds at weight { 1 ,   median ,   Q 3 } ) and tracking community structure via weighted modularity Q (greedy optimization [34,35,36]) and partition stability using Adjusted Rand Index (ARI; [37,38]). As weaker edges are removed, modularity increases Q : 0.411 0.500 0.573 (with 5, 4, and 5 communities), indicating clearer cluster separation; partitions remain reasonably stable (ARI 0.725 then 0.796 ), while the identities of top betweenness “bridge” nodes shift moderately (Jaccard 0.333 then 0.250 ), which is expected when de-emphasizing weak ties. Finally, we examine predictability with a 10% link-prediction holdout on the undirected projection, using standard topology indices—Common Neighbors, Jaccard, and Adamic–Adar [39]—and evaluating with AUROC/AUPR [40,41,42]. Jaccard attains AUROC/AUPR =   0.765 / 0.812 (Common Neighbors =   0.642 / 0.643 , Adamic–Adar =   0.663 / 0.725 ), showing that simple structural cues recover a substantial fraction of hidden edges and are consistent with the community signal. Taken together, these results (i) clarify where degree distribution accounts for much of the global statistics, (ii) show that meso-scale communities strengthen as noise is reduced and remain stable in aggregate, and (iii) demonstrate recoverable structure on held-out links—offering quantitative, reproducible validity evidence without introducing additional human-annotation bias. Note: Table 2 reports weighted modularity on the undirected projection for the full edge set ( Q = 0.4248 ), whereas the robustness analysis recomputes modularity under weight-threshold pruning; the baseline at threshold 1 is Q = 0.411 , increasing to 0.500 and 0.573 at median and Q3 thresholds, respectively.

5.2. Advantages of the SURVEY2MLN Methodology

By guiding the LLM to extract and interpret relational cues from free-form responses and by combining it with prompt engineering, the methodology enables the rapid identification of different types of relationships from large volumes of survey data. This task would otherwise require extensive manual coding and qualitative analysis if conducted without automation. The approach bridges the gap between qualitative input and quantitative network representation, reducing the need for manual coding or rigid extraction rules. As a result, SURVEY2MLN offers a scalable and flexible solution for social network analysis, extending the possibilities of traditional approaches by extracting richer and more diverse relational information from open-ended responses.
The strength of the methodology lies in its capacity to construct multiple network layers based on different aspects of social and organizational interaction—such as research similarity, communication, and formal affiliations—and to integrate them into a unified multilayer structure. The analytical phase further enriches the insight by applying established SNA metrics on each layer, allowing for multilevel analysis of cohesion, influence, and community structure. The cross-layer comparison highlights the alignment or divergence between formal structures and perceived social relationships, uncovering latent patterns that would remain hidden in single-layer representations.
Importantly, the multilayer framework is flexible and can be extended to include additional types of relationships, such as shared memberships in project teams, geographical proximity, or cross-institutional collaborations. It also opens the possibility of studying temporal dynamics by applying the methodology to longitudinal survey data, thereby enabling the analysis of how collaboration patterns evolve over time. This makes SURVEY2MLN not only a tool for descriptive mapping but also a framework for answering broader research questions about the interaction of formal and informal social structures.

5.3. Limitations and Future Directions

As this paper proposes a conceptual model of the SURVEY2MLN methodology and demonstrates it through a proof-of-concept case study, the present work inevitably has several limitations that should be acknowledged. First, the validation procedure relied primarily on manual review of a subset of extracted ties. While this offered a useful feasibility check, such review is susceptible to bias and rater variance and is impractical to scale. We therefore treat it as an initial consistency screen rather than definitive validation. For larger deployments, a multi-annotator, adjudicated ground-truth will be necessary as a complementary instrument—implemented with a clear coding protocol, training and calibration, inter-annotator agreement targets (e.g., Cohen’s κ ), and power-aware sampling—so that standard information-retrieval metrics (precision, recall, F 1 ) and calibration analyses can be computed. In the present manuscript, given the small sample and our focus on methodology, we instead prioritized annotation-free, algorithmic checks to provide reproducible evidence of structural validity.
In terms of validation, several planned methods are still pending and therefore represent a limitation of the current study. Specifically, future work will include the following
  • Human-annotated ground-truth validation of both entity extraction and tie inference, with at least two independent annotators, inter-annotator agreement (e.g., Cohen’s κ ), and standard IR metrics (precision, recall, F1).
  • Benchmarking against conventional NLP/SNA baselines, such as NER + co-occurrence/rule-based tie extraction, to quantify the added value of LLM-based extraction.
  • Cross-model and prompt-sensitivity validation, i.e., repeating the extraction with alternative LLMs and systematically varied prompt/parameter settings (temperature, top-p) to test robustness.
  • External/construct validation via follow-up checks with participants or domain experts (e.g., short confirmatory survey/interviews) to assess whether extracted ties reflect perceived real relations.
Second, the methodology was demonstrated on a relatively small dataset of 27 participants from a single academic department. This limited scope was intentionally chosen in order to enable a detailed manual review and validation of the extracted relationships, as the authors had full contextual knowledge of the setting. While the restricted sample size constrains the generalizability of the findings, the proposed SURVEY2MLN methodology itself is inherently scalable. The LLM-based extraction and network construction process can be applied to substantially larger datasets without major technical barriers, as the data processing phase is highly automated and efficient even for very large collections of survey responses. The primary challenge in extending the approach is not methodological, but practical, namely collecting sufficiently large and representative survey data.
Third, the current implementation has not yet been benchmarked against conventional baselines such as traditional named entity recognition combined with co-occurrence analysis, which would provide additional perspective on the robustness of the proposed approach. Likewise, while the SURVEY2MLN framework is conceptually model-agnostic, the case study presented here was implemented with a single model (GPT-4.5) under fixed parameter settings. The influence of alternative LLMs, prompt designs, and parameter choices remains unexplored. In addition, inherent biases of LLMs may affect which entities and ties are emphasized or overlooked, and ambiguity in entity disambiguation can lead to inconsistencies when names or references are not clearly stated.
Finally, there are broader ethical and privacy-related considerations. Even in anonymized form, reconstructing social ties involves sensitive personal information that must be handled with care. While our case study was conducted within a single department and safeguarded through anonymization and aggregation, any wider deployment of the methodology will require strict data protection protocols and informed consent procedures.
Looking ahead, these limitations point to several concrete directions for future research. The development of annotated, multi-annotator ground-truth datasets (with adjudication protocols and power-aware sampling) will enable the calculation of precision, recall, and F 1 -scores and allow benchmarking against baselines. Extending algorithmic validation to include predictive and cross-domain transfer tests will clarify the added value of the LLM-based approach. Expanding the study to larger and more diverse populations will test the scalability of the framework, while comparative experiments with different LLMs and parameter settings will help determine the most effective configurations for relational extraction. Further methodological refinement will focus on prompt engineering strategies and thresholds for tie definition to ensure robustness of results. Future work will also address the practical challenge of collecting sufficiently large survey datasets. In this regard, we plan to extend the study to the entire University of Rijeka, which would provide insights into interdisciplinary collaboration patterns. Beyond academia, we also see strong potential for adapting the methodology to other organizational domains, thereby further testing its robustness and applicability across diverse contexts. Finally, we will formalize privacy and ethical safeguards to ensure responsible application in broader organizational and societal contexts.
Overall, by acknowledging these limitations and outlining a clear agenda for addressing them—including the creation of high-quality annotated datasets—we position SURVEY2MLN as a flexible and extensible conceptual framework. The present case study illustrates its feasibility, while future research will expand its validation and applicability across domains.

6. Conclusions

This paper introduced SURVEY2MLN, a novel methodology for extracting multilayer social networks from open-ended survey responses using large language models (LLMs) and prompt engineering. The proposed approach enables the transformation of qualitative, narrative data into structured network representations without relying on rigid rule-based NLP pipelines or predefined survey structures.
By integrating LLM-based text interpretation with multilayer network modeling, SURVEY2MLN captures diverse relational dimensions—such as research similarity, communication patterns, and formal organizational affiliations—and unifies them into a coherent multilayer framework. The methodology includes a systematic sequence of phases, from data collection and prompt design to network construction, analysis, and validation.
The case study conducted within an academic department demonstrated the feasibility of this approach, showing that generative AI can accurately extract meaningful social relationships and support standard network analysis. The multilayer structure further enabled the exploration of interdependencies between formal and informal ties, revealing insights into institutional dynamics that would remain inaccessible through single-layer models or structured surveys alone.
The main contribution of this work lies in the development and validation of a generalizable, LLM-driven framework for social network extraction from unstructured textual data. As such, SURVEY2MLN opens new directions for research at the intersection of AI and social sciences, offering a flexible tool for studying complex social systems in various organizational and disciplinary contexts.
Future work will focus on scaling the methodology to larger datasets, automating validation procedures, and expanding the range of relational dimensions that can be modeled using advanced language models.

Author Contributions

Conceptualization, A.M. and D.P.; methodology, A.M., D.P. and S.B.; software, A.M. and D.P.; validation, A.M. and S.B.; resources, A.M. and D.P.; data curation, D.P. and S.B.; writing—original draft preparation, A.M.; writing—review and editing, A.M.; visualization, A.M. and S.B.; supervision, A.M.; project administration, D.P.; funding acquisition, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the European Union via the Croatian Ministry of Education, as part of the project Peoplet—application for the visualization of interactions (NPOO.C3.2.R2-I1.04.0018)—Funded by the European Union–NextGeneration EU, and by the project AI Methods for Analyzing Media Texts and Exploring Information Dissemination–AInfomedia (uniri-iz-25-44). The views and opinions expressed are those of the author and do not necessarily reflect the official views of the European Union or the European Commission. Neither the European Union nor the European Commission can be held responsible for them.

Institutional Review Board Statement

Ethical review and approval were waived for this study because the research consisted of an anonymous, non-interventional survey that collected no personal data or identifiers, consistent with the Act on the Implementation of the GDPR (Official Gazette of the Republic of Croatia 42/18, Art. 1(1), Art. 3(1)) and GDPR Article 4(1)/Recital 26. Participation was voluntary and anonymous; no incentives were provided; data were analyzed in aggregate and are de-identified in the public release.

Informed Consent Statement

This study was conducted in accordance with the Act on the Implementation of the General Data Protection Regulation (Zakon o provedbi Opće uredbe o zaštiti podataka, Official Gazette of the Republic of Croatia 42/18), Section I. General Provisions—Article 1(1) and Article 3(1), which ensure the application of Regulation (EU) 2016/679 (GDPR) in the Republic of Croatia. The online questionnaire was fully anonymous and collected no personal data or direct/indirect identifiers (e.g., names, email/IP addresses, or demographics sufficient to re-identify). In line with GDPR Article 4(1) and Recital 26, data that cannot be linked to an identified or identifiable natural person do not constitute personal data; accordingly, no “processing of personal data” occurred and prior ethical approval was not required. Participation was voluntary; adult respondents were informed of the study purpose, that submission was anonymous, that no personal data would be collected, and that they could skip any question or discontinue at any time without consequence. No incentives were offered. The study involved minimal risk and no intervention. Data were stored on secure institutional infrastructure and analyzed only in aggregate, and the public dataset/Gephi project shared at the DOI is de-identified and contains no personal data.

Data Availability Statement

Data are available at the following link: https://doi.org/10.6084/m9.figshare.29772305 (accessed on 1 August 2025).

Conflicts of Interest

Authors Ana Meštrović and Dino Pitoski were employed by the company Peoplet Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NLPNatural Language Processing
LLMLarge Language Model
SNASocial Network Analysis
MLNMultilayer Network

Appendix A. Prompt Examples

Appendix A.1. Prompt 1–Research Similarity Network (RSL)

  • Prompt 1: System and User Message
  • # ===== SYSTEM MESSAGE =====
    You are a precise research assistant. Your task is to
    process open-ended survey answers row by row and build
    a weighted, undirected Research Similarity Network (RSL)
    using simple overlap counts.
  • GENERAL PRINCIPLES
    - 
Be conservative: extract only what is explicitly 
stated in each participant’s responses (no hallucinations).
    - 
Detect ambiguity (e.g., unclear names/terms) and flag it.
    - 
Unify terminology by mapping variants and related
terms into the same semantic fields/disciplines.
Examples:
 - “NLP”, “text mining”, “sentiment analysis”
   -> “natural language processing”
 - “deep learning”, “neural networks”
   -> “deep learning”
 - Keep tools and languages canonical
   (e.g., "python", “r”, “pytorch”).
    - 
Merge research areas, methods, and tools into ONE
combined canonical set of “skills/attributes”.
    - 
Order of mentions is irrelevant.
    - 
Output must be strictly machine-readable CSV.
  • INPUT FORMAT (one JSON object per participant)
  • {
  •   “id”: “PROF1”,
  •   “q1_primary_area”: “...”,
  •   “q2_goals_methods”: “...”,
  •   “q3_tools”: “...”
  • }
  • ROW-WISE PROCESSING
    For each participant:
    (1)
 Extract canonicalized skills/attributes.
    (2)
 Preserve original terms (deduplicated).
    (3)
 Emit QC flags JSON: {“ambiguous_terms”:bool,...}.
  • CROSS-ROW EDGE CONSTRUCTION
    - 
For each unordered pair (i, j), compute shared items.
    - 
Edge weight = |shared_items|.
    - 
If weight == 0, omit the edge.
  • OUTPUT ARTIFACTS (CSV ONLY)
    (1)  
normalized_profiles.csv
Columns: id, canonical_skills, original_terms, qc_flags
    (2)  
rsl_edges.csv
Columns: source, target, shared_items, weight
  • ORDERING & FORMAT RULES
    - 
Sort normalized_profiles by id ascending.
    - 
Sort rsl_edges by (source, target) ascending.
    - 
No prose, only CSV output.
  • # ===== USER MESSAGE =====
    You will receive the survey as JSON Lines under key “rows”.
    Please perform all steps and return the two CSVs exactly
    as specified (no prose, CSVs only).
  • INPUT EXAMPLE:
  • {
  •   “rows”: [
  •     {
  •       “id”:“P1”,
  •       “q1_primary_area”:“Text mining, sentiment analysis”,
  •       “q2_goals_methods”:“Machine learning, NLP”,
  •       “q3_tools”:“Python, R”
  •     },
  •     {
  •       “id”:“P2”,
  •       “q1_primary_area”:“Natural language processing, deep learning”,
  •       “q2_goals_methods”:“Neural networks, sentiment analysis”,
  •       “q3_tools”:“Python, TensorFlow”
  •     }
  •   ]
  • }
  • Prompt Explanation The presented prompt defines the rules by which survey responses are automatically processed using a large language model. Its objective is to extract and standardize research areas, methods, and tools from free-text answers and to merge them into a unified set of skills/attributes for each participant. The prompt prescribes counting direct matches when terms are written identically (e.g., “NLP” and “NLP”); if no such direct matches exist, the model attempts to identify synonyms and related expressions within the same semantic or scientific field (e.g., “NLP,” “text mining,” “sentiment analysis” → natural language processing). In addition, the prompt requires the flagging of ambiguous, unclear, or noisy expressions (e.g., overly broad terms, typographical errors, or nonsensical strings), thereby ensuring transparency and enabling manual inspection by the researcher. Based on these canonicalized participant profiles, an undirected weighted Research Similarity Network (RSL) is constructed, where edges are formed when participants share common attributes, and edge weights correspond to the number of such overlaps. The output is standardized in the form of CSV tables (normalized_profiles.csv and rsl_edges.csv), suitable for subsequent social network analysis.
  • Example for 5 participants
    Input data (survey responses)
  • {
  •   “rows”: [
  •     {
  •       “id”: “P1”,
  •       “q1_primary_area”: “Text mining, sentiment analysis”,
  •       “q2_goals_methods”: “Machine learning, NLP”,
  •       “q3_tools”: “Python, R”
  •     },
  •     {
  •       “id”: “P2”,
  •       “q1_primary_area”: “My primary focus is natural language
  •         processing and I am particularly interested in deep
  •         learning methods.”,
  •       “q2_goals_methods”: “In my research I use neural networks
  •         to perform sentiment analysis on large collections of
  •         texts. I also do data analysis when appropriate.”,
  •       “q3_tools”: “Most often I work with Python, and for modeling
  •         I rely on TensorFlow.”
  •     },
  •     {
  •       “id”: “P3”,
  •       “q1_primary_area”: “Education, learning analytics”,
  •       “q2_goals_methods”: “Social network analysis (SNA) and
  •         distance learning asdasd”,
  •       “q3_tools”: “Gephi, Python”
  •     },
  •     {
  •       “id”: “P4”,
  •       “q1_primary_area”: “Biotechnology, medicine”,
  •       “q2_goals_methods”: “Data mining, statistics”,
  •       “q3_tools”: “R, SPSS”
  •     },
  •     {
  •       “id”: “P5”,
  •       “q1_primary_area”: “Computer networks and network security”,
  •       “q2_goals_methods”: “We perform traffic analysis and
  •         intrusion detection (signature-based) and evaluate
  •         SDN performance.”,
  •       “q3_tools”: “Wireshark, Mininet, GNS3”
  •     }
  •   ]
  • }
Table A1. Example output from Prompt 1: normalized_profiles.csv.
Table A1. Example output from Prompt 1: normalized_profiles.csv.
IdCanonical_SkillsOriginal_TermsQc_Flags
P1natural language processing; machine learning; sentiment analysis; python; rtext mining, sentiment analysis, machine learning, NLP, Python, R{}
P2natural language processing; deep learning; sentiment analysis; python; tensorflow; data analysisnatural language processing, deep learning, neural networks, sentiment analysis, data analysis, Python, TensorFlow{“ambiguous_terms”: true, “notes”: “term ‘data analysis’ too broad”}
P3education; learning analytics; social network analysis; distance learning; python; gephieducation, learning analytics, Social network analysis, distance learning, asdasd, Python, Gephi{“typos_or_noise”: true, “notes”: “term ‘distnace learnign’ corrected to ‘distance learning’; ‘asdasd’ marked as noise”}
P4biotechnology; medicine; data mining; statistics; r; spssbiotechnology, medicine, data mining, statistics, R, SPSS{}
P5computer networks; network security; traffic analysis; intrusion detection; sdn; wireshark; mininet; gns3computer networks, network security, traffic analysis, intrusion detection, SDN, Wireshark, Mininet, GNS3{}
Table A2. Example output from Prompt 1: rsl_edges.csv (only edges with weight > 0).
Table A2. Example output from Prompt 1: rsl_edges.csv (only edges with weight > 0).
SourceTargetShared_ItemsWeight
P1P2natural language processing|sentiment analysis|python3
P1P3python1
P1P4r1
P2P3python1

Appendix A.2. Prompt 2–Communication Network (CN)

  • Prompt 2: System and User Message
  • # ===== SYSTEM MESSAGE =====
    You are a precise research assistant. Your task is to
    process open-ended survey answers row by row and build
    a directed, weighted Communication Network (CN) based
    on reported contacts.
  • GENERAL PRINCIPLES
    - 
Input data contains free-text answers where each
participant names up to 5 persons they communicate
with inside their organization.
    - 
The order of names matters: the first person named
= highest communication frequency.
    - 
Assign edge weights strictly according to order:
 1st person = 5, 2nd = 4, 3rd = 3, 4th = 2, 5th = 1.
    - 
If fewer than 5 names are listed, weights are still
assigned in descending order starting from 5.
    - 
Graph is directed: from respondent_id (source)
to named_person_id (target).
    - 
Each pair (respondent, named_person) corresponds to
one directed edge with its weight.
    - 
Names may appear in different forms (typos, nicknames,
case variations). Attempt to canonicalize to a consistent
identifier if possible, and flag uncertain matches.
    - 
Output must be strictly machine-readable CSV.
  • INPUT FORMAT (row per participant)
  • {
  •   “id”: “P1”,
  •   “q_contacts”: “Alice, Bob, Charlie, David, Emma”
  • }
  • ROW-WISE PROCESSING
For each participant:
    (1)   
Split the response into a list of up to 5 names, in
given order.
    (2)   
Normalize names (remove whitespace, standardize case,
resolve obvious typos).
    (3)   
Assign integer weights: [5,4,3,2,1] mapped onto the
listed names.
    (4)   
For each named person, emit a directed edge:
respondent_id -> named_person, weight.
    (5)   
Preserve original spelling of names as metadata.
    (6)   
Emit QC flags if:
         -    
more than 5 names given (truncate after 5, flag “overflow”),
         -    
unclear name or noise detected (flag “ambiguous_name”).
  • OUTPUT ARTIFACT
    (1) 
comm_edges.csv
Columns:
 source, target, weight, original_name, qc_flags
  • ORDERING & FORMAT RULES
    - 
Sort edges by source ascending, then weight descending.
    - 
Do not output explanations, markdown, or prose. CSV only.
  • # ===== USER MESSAGE =====
    You will receive the survey as JSON Lines under key “rows”.
    Please perform all steps and return the comm_edges.csv file
    exactly as specified.
  • INPUT EXAMPLE:
  • {
  •   “rows”: [
  •     {“id”:“P1”,“q_contacts”:“Alice, Bob, Charlie, David, Emma”},
  •     {“id”:“P2”,“q_contacts”:“John, Alice, Peter”}
  •   ]
  • }
  • Prompt Explanation The presented prompt operationalizes the construction of a directed and weighted communication network from free-text survey responses. Each participant lists up to five individuals within their organization with whom they communicate most frequently. The order of listed names determines the edge weights, ranging from 5 (most frequent communication) to 1 (least frequent), while fewer than five responses are scaled accordingly. Directed edges are generated from the respondent to each named person, with additional quality-control flags used to identify ambiguous or noisy entries (e.g., typographical errors, unclear names, or more than five responses). The resulting edge list (comm_edges.csv) provides a standardized machine-readable representation of interpersonal communication patterns, suitable for subsequent social network analysis.
  • Example for 5 participants
  • Input data (survey responses)
  • {
  •   “rows”: [
  •     {
  •       “id”: “P1”,
  •       “q_contacts”: “Alice, Bob, Charlie, Diana, Emma”
  •     },
  •     {
  •       “id”: “P2”,
  •       “q_contacts”: “John, Alice, Peter”
  •     },
  •     {
  •       “id”: “P3”,
  •       “q_contacts”: “Charlie, Alice, Bob, Frank”
  •     },
  •     {
  •       “id”: “P4”,
  •       “q_contacts”: “Emma, Alice, Bob, Charlie, John”
  •     },
  •     {
  •       “id”: “P5”,
  •       “q_contacts”: “Anna, Robert”
  •     }
  •   ]
  • }
Table A3. Example output from Prompt 2: comm_edges.csv.
Table A3. Example output from Prompt 2: comm_edges.csv.
SourceTargetWeight
P1Alice5
P1Bob4
P1Charlie3
P1Diana2
P1Emma1
P2John5
P2Alice4
P2Peter3
P3Charlie5
P3Alice4
P3Bob3
P3Frank2
P4Emma5
P4Alice4
P4Bob3
P4Charlie2
P4John1
P5Anna5
P5Robert4
The example illustrates the construction of a directed, weighted communication network based on survey responses. Each listed contact is converted into a directed edge from the respondent to the named individual, with weights assigned according to order of mention (5 for the first, down to 1 for the fifth). The resulting edge list provides a standardized representation of communication frequency, suitable for subsequent social network analysis.

Appendix B. Survey Questions

The dataset analyzed in this study was collected through an open-ended survey conducted among researchers at the Faculty of Informatics and Digital Technologies (FIDIT), University of Rijeka. The survey consisted of five questions. Four of them (Q1–Q3 and Q5) were used in the construction of multilayer networks within the SURVEY2MLN methodology, while Q4 provides complementary insights into future research directions. The exact wording of the survey questions (translated into English) is given below:
  • What is the primary area of your research work? Please indicate if your research involves analyzing data from specific domains (e.g., education, biotechnology, medicine). We kindly ask you to also describe the main objectives of such analyses.
  • Which scientific/professional methods (e.g., machine learning methods, social network analysis methods) do you use in your work? Please list all methods in detail and rank them according to frequency of use, from the most frequent to the least frequent.
  • Which tools/systems/programming languages do you use in your work? Please provide a detailed list and rank them according to frequency of use, from the most frequent to the least frequent.
  • On which topics or projects would you like to work in the future? These can be topics that differ from your current area of research or that could complement it.
  • Who are up to five people within your organization with whom you communicate most frequently, whether professionally or informally? Please list them in order of communication frequency, starting with the person you interact with the most and ending with the one with whom you have less frequent contact.

References

  1. Tabassum, S.; Pereira, F.S.; Fernandes, S.; Gama, J. Social Network Analysis: An Overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1256. [Google Scholar] [CrossRef]
  2. Pitoski, D.; Meštrović, A.; Schmeets, H. The Complex Network Patterns of Human Migration at Different Geographical Scales: Network Science Meets Regression Analysis. Appl. Netw. Sci. 2024, 9, 35. [Google Scholar] [CrossRef]
  3. Pascual-Ferrá, P.; Alperstein, N.; Barnett, D.J. Social Network Analysis of COVID-19 Public Discourse on Twitter: Implications for Risk Communication. Disaster Med. Public Health Prep. 2022, 16, 561–569. [Google Scholar] [CrossRef] [PubMed]
  4. Clark, R.; Kentor, J. Foreign Capital and Economic Growth: A Social Network Analysis, 2001–2017. Sociol. Perspect. 2022, 65, 580–607. [Google Scholar] [CrossRef]
  5. Sahoo, S.R.; Gupta, B.B. Multiple Features Based Approach for Automatic Fake News Detection on Social Networks Using Deep Learning. Appl. Soft Comput. 2021, 100, 106983. [Google Scholar] [CrossRef]
  6. Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
  7. Schneider, J.; Meske, C.; Kuss, P. Foundation Models: A New Paradigm for Artificial Intelligence. Bus. Inf. Syst. Eng. 2024, 66, 221–231. [Google Scholar] [CrossRef]
  8. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  9. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
  10. Wang, S.; Huang, J.; Chen, Z.; Song, Y.; Tang, W.; Mao, H.; Fan, W.; Liu, H.; Liu, X.; Yin, D.; et al. Graph Machine Learning in the Era of Large Language Models (LLMs). ACM Trans. Intell. Syst. Technol. 2024, 16, 1–40. [Google Scholar] [CrossRef]
  11. Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
  12. Nadeau, D.; Sekine, S. A Survey of Named Entity Recognition and Classification. Lingvist. Investig. 2007, 30, 3–26. [Google Scholar] [CrossRef]
  13. Li, J.; Sun, A.; Han, J.; Li, C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. Knowl. Data Eng. 2022, 34, 50–70. [Google Scholar] [CrossRef]
  14. Chang, S.; Chaszczewicz, A.; Wang, E.; Josifovska, M.; Pierson, E.; Leskovec, J. LLMs Generate Structurally Realistic Social Networks but Overestimate Political Homophily. In Proceedings of the International AAAI Conference on Web and Social Media, Copenhagen, Denmark, 23–26 June 2025; Volume 19, pp. 341–371. [Google Scholar]
  15. Mellon, J.; Bailey, J.; Scott, R.; Breckwoldt, J.; Miori, M.; Schmedeman, P. Do AIs Know What the Most Important Issue Is? Using Language Models to Code Open-Text Social Survey Responses at Scale. Res. Polit. 2024, 11, 20531680241231468. [Google Scholar] [CrossRef]
  16. Meštrović, A.; Beliga, S.; Pitoski, D. Peoplet: Exploring Organizational Structures through Social Network Analysis. In Proceedings of the 48th ICT and Electronics Convention MIPRO, Opatija, Croatia, 2–6 June 2025; pp. 253–258. [Google Scholar]
  17. Deleris, L.; Bonin, F.; Daly, E.; Deparis, S.; Hou, Y.; Jochim, C.; Lassoued, Y.; Levacher, K. Know Who Your Friends Are: Understanding Social Connections from Unstructured Text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, New Orleans, LA, USA, 1–6 June 2018; pp. 76–80. [Google Scholar]
  18. Yose, J.; Kenna, R.; MacCarron, M.; MacCarron, P. Network Analysis of the Viking Age in Ireland as Portrayed in Cogadh Gaedhel re Gallaibh. R. Soc. Open Sci. 2018, 5, 171024. [Google Scholar] [CrossRef] [PubMed]
  19. Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. arXiv 2022, arXiv:2206.07682. [Google Scholar] [CrossRef]
  20. Dehal, R.S.; Sharma, M.; Rajabi, E. Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. Mach. Learn. Knowl. Extr. 2025, 7, 38. [Google Scholar] [CrossRef]
  21. Ananya, A.; Tiwari, S.; Mihindukulasooriya, N.; Soru, T.; Xu, Z.; Moussallem, D. Towards Harnessing Large Language Models as Autonomous Agents for Semantic Triple Extraction from Unstructured Text. In Proceedings of the Extended Semantic Web Conference, Hersonissos, Greece, 26–30 May 2024. [Google Scholar]
  22. Malaterre, C.; Lareau, F. Inferring Social Networks from Unstructured Text Data: A Proof of Concept Detection of Hidden Communities of Interest. Data Policy 2024, 6, e5. [Google Scholar] [CrossRef]
  23. Ren, X.; Tang, J.; Yin, D.; Chawla, N.; Huang, C. A Survey of Large Language Models for Graphs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6616–6626. [Google Scholar]
  24. Kivelä, M.; Arenas, A.; Barthelemy, M.; Gleeson, J.P.; Moreno, Y.; Porter, M.A. Multilayer Networks. J. Complex Netw. 2014, 2, 203–271. [Google Scholar] [CrossRef]
  25. Boccaletti, S.; Bianconi, G.; Criado, R.; Del Genio, C.I.; Gómez-Gardenes, J.; Romance, M.; Sendina-Nadal, I.; Wang, Z.; Zanin, M. The Structure and Dynamics of Multilayer Networks. Phys. Rep. 2014, 544, 1–122. [Google Scholar] [CrossRef]
  26. Bianconi, G. Multilayer Networks: Structure and Function; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
  27. Aleta, A.; Moreno, Y. Multilayer Networks in a Nutshell. Annu. Rev. Condens. Matter Phys. 2019, 10, 45–62. [Google Scholar] [CrossRef]
  28. De Domenico, M. More Is Different in Real-World Multilayer Networks. Nat. Phys. 2023, 19, 1247–1262. [Google Scholar] [CrossRef]
  29. Meštrović, A.; Petrović, M.; Beliga, S. Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features. Appl. Sci. 2022, 12, 11216. [Google Scholar] [CrossRef]
  30. Hammoud, Z.; Kramer, F. Multilayer Networks: Aspects, Implementations, and Application in Biomedicine. Big Data Anal. 2020, 5, 2. [Google Scholar] [CrossRef]
  31. Lv, Y.; Huang, S.; Zhang, T.; Gao, B. Application of Multilayer Network Models in Bioinformatics. Front. Genet. 2021, 12, 664860. [Google Scholar] [CrossRef]
  32. OpenAI. GPT-4.5 API Documentation. Available online: https://platform.openai.com/docs (accessed on 4 March 2025).
  33. Newman, M.E.J. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
  34. Clauset, A.; Newman, M.E.J.; Moore, C. Finding Community Structure in Very Large Networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef]
  35. Newman, M.E.J. Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 2004, 69, 066133. [Google Scholar] [CrossRef] [PubMed]
  36. Fortunato, S. Community Detection in Graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
  37. Rand, W.M. Objective Criteria for the Evaluation of Clustering Methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
  38. Hubert, L.; Arabie, P. Comparing Partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
  39. Adamic, L.A.; Adar, E. Friends and Neighbors on the Web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
  40. Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
  41. Krackhardt, D. QAP Partialling as a Test of Spuriousness. Soc. Netw. 1987, 9, 171–186. [Google Scholar] [CrossRef]
  42. Dekker, D.; Krackhardt, D.; Snijders, T.A.B. Sensitivity of MRQAP Tests to Collinearity and Autocorrelation Conditions. Psychometrika 2007, 72, 563–581. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Illustration of a multilayer network with four layers. Solid lines denote intralayer connections; dashed lines denote interlayer connections.
Figure 1. Illustration of a multilayer network with four layers. Solid lines denote intralayer connections; dashed lines denote interlayer connections.
Applsci 16 00163 g001
Figure 2. Multilayer network structure constructed from survey data. Solid lines indicate intra-layer connections, while dashed lines denote inter-layer connections across different layers. In the RSL1 layer, node colors correspond to distinct communities. Circular nodes represent scientists in the first two layers, squares denote Chairs, and triangles represent Laboratories.
Figure 2. Multilayer network structure constructed from survey data. Solid lines indicate intra-layer connections, while dashed lines denote inter-layer connections across different layers. In the RSL1 layer, node colors correspond to distinct communities. Circular nodes represent scientists in the first two layers, squares denote Chairs, and triangles represent Laboratories.
Applsci 16 00163 g002
Table 1. Detailed steps of the SURVEY2MLN methodology for generating multilayer social networks from open-ended survey responses.
Table 1. Detailed steps of the SURVEY2MLN methodology for generating multilayer social networks from open-ended survey responses.
PhaseStepDescription
11.1. Data CollectionCollect open-ended survey responses covering research interests, tools, collaboration, and communication.
1.2. Data PreprocessingClean and standardize textual responses; organize data into a structured tabular format for processing.
22.1. Prompt DesignDesign prompts tailored to extract relevant information for each network layer (e.g., similarity, communication).
2.2. Prompt RefinementTest and iteratively adjust prompts to improve accuracy, handle variation in language, and reduce ambiguity.
33.1. Relation ExtractionUse LLMs to extract entity pairs and relational data (e.g., shared keywords, named individuals).
3.2. Edge List GenerationTranslate LLM outputs into structured edge lists for each type of relation, including direction and weight.
3.3. Layer ConstructionConstruct individual network layers from edge lists, each representing a specific social or organizational dimension.
44.1. Entity LinkingIdentify and align identical nodes (individuals or units) across layers.
4.2. Cross-Layer MappingCreate interlayer links between individuals, organizational units, and groups based on shared membership.
4.3. Multilayer Network IntegrationIntegrate all layers and interlayer links into a coherent multilayer network structure.
55.1. Global-Level AnalysisCompute metrics such as density, diameter, path length, and clustering for each layer.
5.2. Meso-Level AnalysisDetect communities using algorithms like Louvain and evaluate modularity.
5.3. Local-Level AnalysisIdentify key actors using centrality measures (degree, betweenness, closeness).
5.4. Cross-Layer InterpretationCompare structures across layers to identify consistent or divergent patterns.
66.1. Automated EvaluationWhen annotated ground-truth data are available, evaluate extraction accuracy using standard metrics such as precision, recall, and F1-score.
6.2. Manual ReviewSelect a sample of LLM-extracted links for manual inspection to identify errors such as inconsistent name resolution or false ties.
6.3. Feedback LoopUse insights from automated and/or manual evaluation to refine prompts, preprocessing, or data formatting.
6.4. Final ValidationEnsure consistency, reliability, and analytical usefulness of the multilayer network before interpretation or application.
Table 2. Network measures used in the SURVEY2MLN methodology.
Table 2. Network measures used in the SURVEY2MLN methodology.
LevelMeasure/TechniqueDescription
GlobalAverage degreeMean number of connections per node in a layer. (our dataset G 2 : k in = k out = 4.4444 )
Average weighted degreeMean interaction intensity per node (node strength), accounting for edge weights. (our dataset G 2 : s in = s out = 13.3333 )
Network diameterMaximum shortest path length between any two nodes (on the largest weakly connected component for directed layers). (our dataset G 2 : 4.000000 )
Average path lengthMean shortest path between pairs of nodes (on G gc ). (our dataset G 2 : 2.210826 )
Network densityProportion of actual connections to all possible connections. Directed: δ = m n ( n 1 ) . (our dataset G 2 : 0.170940 )
Clustering coefficientTendency of nodes to form tightly connected groups (average weighted local clustering on the undirected projection). (our dataset G 2 : 0.203523 )
AssortativitySimilarity-based connectivity; degree–degree Pearson correlation on the undirected projection. (our dataset G 2 : 0.017892 )
ReciprocityExtent to which directed ties are mutual. (our dataset G 2 : 0.700000 )
MesoCommunity detectionIdentification of clusters of densely connected nodes (greedy modularity on undirected, weighted projection). (our dataset G 2 : | C | = 5 communities)
ModularityEvaluation of community structure (weighted Newman–Girvan); values above 0.3 indicate significant modularity. (our dataset G 2 : Q = 0.424772 )
LocalDegree centralityNumber of direct ties of a node. (our dataset G 2 : strongest in-degree = PROF1 = 9; strongest out-degree = AS1 = 5)
In-degree centralityNumber of times an actor is mentioned by others (important in survey-based layers). (our dataset G 2 : PROF1 = 9)
Betweenness centralityExtent to which a node lies on the shortest paths between others. (our dataset G 2 : AS1 = 0.1983)
Closeness centralityHow close a node is to all others in terms of path length. (our dataset G 2 : PROF6 = 0.5087)
Cross-layerCross-layer interpretationAnalysis of structural alignment and divergence across layers; includes comparison of node positions, identification of actors bridging multiple relationship types, and analysis of interlayer links (e.g., individuals to departments).
Table 3. Examples of prompt-based extraction tasks for transforming open-ended survey responses into network data.
Table 3. Examples of prompt-based extraction tasks for transforming open-ended survey responses into network data.
Prompt ExampleComponentContent
1. Keyword-Based
Similarity Extraction
Example DescriptionExtract keywords from research profiles, compare pairs of researchers, and count shared keywords to generate weighted undirected edges.
InputResearcher: PROF1
Profile: Machine learning, natural language processing, Python, large language models
Researcher: PROF2
Profile: Data science, neural networks, NLP, Python, generative models
OutputPROF1–PROF2: 2 shared keywords (NLP, Python)
Structured as: source, target, weight
2. Named Entity
Standardization and
Edge Weighting
Example DescriptionExtract and standardize names mentioned in communication-related survey responses; assign edge weights based on order of mention.
InputResearcher: AS3
Mentions: “I mostly talk with prof. Ivan, then sometimes with Kristina and Dragan.”
OutputAS3–PROF6: 5
AS3–AS4: 4
AS3–AS2: 3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meštrović, A.; Beliga, S.; Pitoski, D. LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Appl. Sci. 2026, 16, 163. https://doi.org/10.3390/app16010163

AMA Style

Meštrović A, Beliga S, Pitoski D. LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Applied Sciences. 2026; 16(1):163. https://doi.org/10.3390/app16010163

Chicago/Turabian Style

Meštrović, Ana, Slobodan Beliga, and Dino Pitoski. 2026. "LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response" Applied Sciences 16, no. 1: 163. https://doi.org/10.3390/app16010163

APA Style

Meštrović, A., Beliga, S., & Pitoski, D. (2026). LLMs for Social Network Analysis: Mapping Relationships from Unstructured Survey Response. Applied Sciences, 16(1), 163. https://doi.org/10.3390/app16010163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop