Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation

Padmavilochanan, Aswathi; Gangadharan, Veena; Rashed, Tarek; Natarajan, Amritha

doi:10.3390/bdcc10010006

Open AccessArticle

Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation

¹

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Clappana, Kollam 690525, India

²

Center for Women’s Empowerment and Gender Equality, Amrita Vishwa Vidyapeetham, Amritapuri, Clappana, Kollam 690525, India

³

Department of Computer Science and Applications, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Clappana, Kollam 690525, India

⁴

Geospatial Innovation Program, Center for Environment & Society, Washington College, Chestertown, MD 21620, USA

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2026, 10(1), 6; https://doi.org/10.3390/bdcc10010006

Submission received: 13 September 2025 / Revised: 6 November 2025 / Accepted: 12 November 2025 / Published: 24 December 2025

Download

Browse Figures

Versions Notes

Abstract

This interdisciplinary pilot study examines the use of Natural Language Processing (NLP) techniques, specifically Large Language Models (LLMs) with Prompt Engineering (PE), to analyze economic vulnerability from qualitative self-narratives. Seventy narratives from twenty-five women in the Palk Bay coastal region of Rameshwaram, India were analyzed using a schema adapted from a contextual empowerment framework. The study operationalizes theoretical constructs into structured Information Extraction (IE) templates, enabling systematic identification of multiple vulnerability aspects, contributing factors, and experiential expressions. Prompt templates were iteratively refined and validated through dual-annotator review, achieving an F1-score of 0.78 on a held-out subset. Extracted elements were examined through downstream analysis, including pattern grouping and graph-based visualization, to reveal co-occurrence structures and recurring vulnerability configurations across narratives. The findings demonstrate that LLMs, when aligned with domain-specific conceptual models and supported by human-in-the-loop validation, can enable interpretable and replicable analysis of self-narratives. While findings are bounded by the pilot scale and community-specific context, the approach supports translation of narrative evidence into community-level program design and targeted grassroots outreach, with planned expansion to multi-site, multilingual datasets for broader applicability.

Keywords:

large language models (LLMs); prompt engineering (PE); information extraction (IE); vulnerability assessment; social science applications of AI; self-narrative analysis

1. Introduction

Understanding and addressing social vulnerability has long been a concern across disciplines such as development studies, public policy, and social research [1]. While much of the existing literature offers conceptual and theoretical insights into vulnerability [2,3], there remains a significant gap in methodological tools that can translate these insights into actionable, data-driven analysis [4,5]—particularly when dealing with rich, qualitative sources such as personal narratives. Research in this area continues to rely predominantly on quantitative approaches such as machine learning and GIS [6], and despite advances in Natural Language Processing (NLP), the analysis of unstructured or semi-structured data remains limited in scope. Specifically, holistic frameworks capable of generating structured insights from interviews or self-narratives—while preserving the complexity of individual experiences—are scarce. This gap is further widened by the absence of integrated methodological frameworks that effectively combine theoretical grounding with practical analytical workflows [7,8]. As a result, qualitative analysis remains largely inaccessible to those without specialized technical expertise, limiting its broader adoption among social researchers and practitioners.

Although recent NLP techniques have demonstrated their utility in tasks such as classification, prediction, or aggregate trend analysis in the social domain, they rarely contribute toward constructing grounded, interpretable profiles that reflect the lived experiences of individuals—a form of human-centric interpretability [9]. Emerging perspectivist approaches in NLP, which seek to prioritize subjective, contextualized understanding, are still in their infancy and have primarily been applied in areas such as mental health detection and social media analysis [10,11]. Even in these domains, the output often lacks holistic, perspective-based summarization that is meaningfully integrated with social science theory.

Moreover, extracting structured information from narrative data remains a methodological challenge due to the inherently complex, multilayered nature of human expression [12]. As qualitative data grows in volume and variety, conventional information extraction (IE) methods struggle to meet the demands of scalability, interpretability, and automation [13]. These methods are also often too complex or resource-intensive for non-technical users—such as social researchers, community organizations, and impact analysts—thereby limiting their practical utility. Beyond industry applications [14], there is a growing need to develop structured IE systems that are not only interpretable but also reusable—facilitating downstream tasks such as querying, indexing, and policy evaluation [15,16].

By combining theoretical foundations from vulnerability studies with practical NLP techniques, this approach establishes structured workflows for the automatic extraction and annotation of vulnerability elements from narrative samples. While still in its pilot phase, the study offers researchers a user-friendly tool for systematic narrative analysis that preserves interpretive nuance while aligning with the growing trend of LLM-PE applications across applied domains [17], underscoring the timeliness and relevance of this work.

To operationalize this study’s objectives, we adopt the working hypothesis that structured prompting of LLMs, when guided by domain-specific conceptual models and supported by human-in-the-loop validation, can enable interpretable, replicable, and scalable extraction of multidimensional vulnerability markers from qualitative self-narratives. This hypothesis rests on the assumption that computational techniques can meaningfully augment the interpretive depth of qualitative analysis while preserving contextual fidelity—particularly when grounded in established social science frameworks. Accordingly, the modular framework developed in this study evaluates this hypothesis across key stages, from domain modeling to information extraction and empirical validation.

The major contributions of our work are as follows:

Domain-Specific Structuring of Vulnerability Information: Defines key information elements and their relationships within the context of gendered vulnerability, providing a structured model for socio-economic narrative analysis.
Mapping Social Science Theoretical Constructs to NLP Techniques: Systematically maps traditional theories of narrative analysis to AI-based IE strategies.
Adaptation of Prompt Engineering for Accessible Vulnerability Extraction: Demonstrates how prompt engineering can be adapted for extracting multidimensional vulnerability markers from real-world self-narratives, enabling usability by non-specialists.
Human-in-the-Loop Validation Framework: The proposed analytical workflow incorporates human oversight not only at the final validation stage but also at intermediate stages, including prompt refinement, intermediate IE assessments, and quality control checks. Manual annotation and expert feedback are integrated throughout the process to ensure the correctness, interpretability, and practical applicability of the machine-extracted information.
Graph-Based Visualization of Extracted Vulnerability Patterns: Organizes extracted information elements through unsupervised categorization and graph-based visualization, enabling structured exploration of recurring vulnerability patterns.

This study is positioned as a pilot feasibility investigation that explores the potential of Large Language Models (LLMs) as a methodological solution for vulnerability assessment. Specifically, we operationalize LLMs within a structured IE pipeline followed by downstream analytical interpretation. In this stage, we focus on the dimension of Economic Vulnerability, which has been the most consistently articulated domain in the collected narratives. As a result, certain vulnerability dimensions (e.g., Environmental Quality, Safety and Security) appear less frequently or are absent in the dataset, reflecting the narrative distribution rather than a methodological exclusion. The conceptual definitions and the categorization schema used in this study are grounded in a comprehensive literature review and are further established in Section 3.1 and Section 3.1.1.

2. Literature Review

The goal of this study is to understand the experiences, perspectives, and reasoning that participants construct through their accounts during interviews. Accordingly, the interview analysis methodology adopted here aligns with, and can be situated within, the broader tradition of narrative analysis. In line with this orientation, the literature review was organized around three key strands: (A) Theoretical aspects of narrative inquiry, (B) Computational approaches to narrative analysis, and (C) Frameworks for analyzing individual vulnerability. It begins by addressing the theoretical foundations of narrative analysis, which highlight the need for a computational approach. The review then progresses to defining the elements involved in assessing vulnerabilities through descriptive narratives. It explores how previous research has linked this specific need with the application of AI and NLP, concluding with various methods researchers have used to apply NLP in analyzing and assessing vulnerabilities in self-narratives.

2.1. Theoretical Aspects of Narrative Inquiry

The literature review began by examining the theoretical foundations of narrative analysis to understand how narrative structures can inform computational approaches to qualitative data. Classical narrative theory focuses on internal narrative organization and the interaction of narrative elements, such as narrator perspective and event sequencing [18,19]. Post-classical approaches extend this view by emphasizing how meaning emerges through the interaction between narrative, audience, and context, shifting attention from fixed internal structures to the dynamic, situated construction of sense-making. This shift is particularly relevant for computational narrative analysis, where the goal is not only to identify elements within a narrative but also to interpret how individuals position and express their experiences within specific social, cultural, and relational contexts.

Across the narrative analysis literature, common analytical practices involve identifying core narrative components—agents, events, temporality, spatial setting, and perspective—and understanding how these interact to form coherent stories [18]. Rather than treating these elements as isolated units, studies emphasize examining their relational patterns, such as character interdependencies, causal or sequential event progressions, and the influence of situational context on narrative meaning [19]. Nasheeda et al. further highlight narrative construction as a process grounded in lived experience, where chronological sequence and experiential framing shape how individuals represent challenges, decisions, and emotional states [20,21]. This orientation aligns with broader work in health psychology and related domains, where narrative analysis serves as a way to understand how individuals articulate meaning around personal and social experiences [22].

Parallel to narrative theory, research in interdisciplinary data management underscores the need for structured approaches that preserve interpretive nuance while enabling systematic analysis. Prior work discusses the importance of adaptable data management frameworks for handling diverse qualitative materials, particularly when integrating social and computational perspectives [23]. Recent studies further emphasize that structured data practices strengthen transparency, monitoring, and decision-making within organizational and NGO contexts [24,25]. These discussions reinforce the value of analytic approaches that maintain contextual richness while supporting reproducibility.

Recent surveys in narrative extraction demonstrate increasing sophistication in computational techniques for handling multi-layered narrative content, while also acknowledging the continued challenge of capturing subjective and context-dependent meanings [26]. Building on this, our study applies large language models with prompt engineering to identify vulnerability-relevant narrative cues in self-reported accounts. This aligns with the broader movement from structural to context-attentive narrative inquiry, where the focus lies not only in identifying textual components but also in understanding how individuals express and frame vulnerability through narrative form. The emphasis on agents, events, temporality, and experiential context in prior narrative research directly informs our computational approach, which treats self-narratives as structured articulations of perceived challenges rather than unstructured text.

2.2. Computational Approaches to Narrative Analysis

Advances in NLP have expanded the possibilities for analyzing personal narratives, particularly in domains where vulnerability is expressed through subjective and contextual language. Traditional qualitative approaches often face limitations in scaling or systematically capturing the layered reasoning present in such narratives, whereas recent developments in large language models (LLMs) offer improved capacity for interpreting complex linguistic cues and narratorial perspective. LLMs have shown the potential to complement or, in some contexts, approximate human interpretation across fields including medicine, psychology, and social science, where understanding subtle experiential expression is central to the analysis. Nedungadi et al., analyzes LLM applications including ChatGPT in biomedical domains, with ethical and reliability considerations [27]. Models designed to generate self-reflective rationales have demonstrated the ability to surface nuanced interpretive judgments in narrative settings, indicating their relevance to tasks involving personal meaning construction [28]. These advancements build on foundational representational methods such as word embeddings [29], which remain integral to how narrative semantics are structured and contextualized computationally.

Framework-based qualitative research methods have also been used to assess the fidelity of LLM-generated interpretations by comparing model-derived themes with those identified in semi-structured interviews [30]. Such work suggests that while performance varies across demographic and contextual settings, LLMs can reliably assist in thematic interpretation when supported by clear task conditioning. The introduction of foundation models such as GPT-3 and GPT-4 has further broadened narrative analysis, enabling models to handle highly varied narrative forms and less structured expressions of experience [31].

Within this trajectory, prompt engineering has emerged as a crucial mechanism for aligning model outputs with analytical objectives. Techniques such as structured prompting, Base Prompting, and Chain-of-Thought (COT) have been shown to significantly improve the extraction of narrative structure from diverse textual sources [32,33]. For example, work using the Llama-2-7B-Instruct model demonstrates effective adherence to natural language instructions for identifying narrative elements in news contexts [34,35]. Similarly, Hung-Ting Su et al. show that LLMs can interpret narrative tropes in movie synopses with high consistency when guided through targeted prompting strategies [36]. Collectively, these studies indicate that LLMs, when prompted with explicit structural cues, can outperform supervised baselines in narrative interpretation tasks, marking a notable methodological advancement in automated narrative analysis.

2.2.1. Adoption of LLMs in Social Science Studies

LLMs increasingly support large-scale and context-sensitive narrative analysis in computational social science [37]. Their integration introduces methodological opportunities alongside interpretive considerations, as generative AI reshapes inference practices [38]. Prompt engineering has enabled LLMs to produce contextually coherent interpretations in domains such as health communication [39], while frameworks like APT-Pipe demonstrate practical prompt-based annotation workflows [40]. Advances in multimodal prompting, including MM-ReAct, further show that LLMs can be guided toward structured reasoning [41]. Domain adaptation remains essential, as model performance improves when aligned to context-specific semantic characteristics [42]. In this study, PE is used to align automated information extraction with social-theoretical constructs to surface subtle vulnerability cues in self-narratives.

2.2.2. Studies in Mental Health Domain

Narrative-focused NLP has been widely applied in mental and physical health contexts [43]. A recent systematic review by Malgaroli et al. documents the growing use of transformer-based models to analyze patient and provider narratives, while noting ongoing challenges relating to linguistic generalizability, reproducibility, and interpretability [10]. Complementary work by Bartal et al. demonstrates how childbirth narratives can indicate CB-PTSD through linguistic and semantic pattern analysis using sentence transformers and neural architectures [44].

Together, these studies show that NLP and LLMs can automate and augment narrative interpretation in domains where subjective experience is central, sometimes outperforming manual annotation in tasks involving nuanced narrative cues and multimodal information [45,46]. Building on this foundation, the present work extends LLM-based narrative analysis to economic vulnerability, evaluating how guided model reasoning can generate consistent and contextually grounded insights into lived experience.

2.3. Frameworks for Analyzing Individual Vulnerability

The analysis of individual vulnerability requires frameworks that are both theoretically robust and adaptable across contexts and populations. In this study, while the pilot focuses on women, the broader aim is to employ a framework that can generalize across demographic and thematic settings.

Gressel et al.’s framework for Women’s Empowerment (WE) provides an essential foundation for this work [4]. The model conceptualizes empowerment not as a static trait but as a dynamic, context-dependent process shaped by structural, relational, and experiential conditions. Importantly, it reframes vulnerability as the set of contextual constraints that limit one’s ability to act with autonomy and agency. This contextual orientation is crucial for narrative-based vulnerability analysis, where expressions of difficulty, constraint, or resilience must be understood relative to lived circumstances rather than solely as individual attributes. The framework therefore foregrounds the interplay between enabling and constraining conditions, allowing vulnerability to be interpreted not merely as the absence of empowerment, but as a situated state reflecting social, economic, cultural, and affective contexts.

Because the model is conceptual, our study adapts and operationalizes it for computational analysis. This involves translating WE domains into extractable narrative elements that reflect how individuals describe their circumstances, challenges, and capacities. The operationalization process, detailed in the methodology, specifies a structured schema that includes vulnerability aspects, indicators, contributing factors, and expressive markers. This maintains alignment with the theoretical foundation while enabling systematic extraction in a computational pipeline. The adaptability of the model is central here: it provides structure without prescribing content, allowing the framework to be extended beyond women-focused contexts in future applications.

Despite advances in NLP and the increased use of LLMs in social, clinical, and narrative research, the literature indicates a gap in methods that combine theoretical grounding with reproducible computational procedures for extracting layered vulnerability markers. Existing studies often examine single dimensions of vulnerability or rely on unstructured qualitative interpretation, limiting scalability and comparability. Similarly, while recent work highlights the promise of LLMs in mental health screening and narrative reasoning, there is limited research demonstrating how these models can be systematically guided to identify multiple, interrelated vulnerability elements while preserving interpretive depth. This study addresses that methodological gap by integrating a theory-driven framework (Gressel et al.) with a prompt-engineered, schema-guided information extraction pipeline, enabling the structured interpretation of self-narratives for socio-economic vulnerability analysis.

3. Methodology

The modular framework presented in this study outlines a structured approach to narrative-based IE and analysis using LLM and PE techniques, as illustrated in Figure 1. Designed to support the interpretation of interview narratives, the framework comprises four interrelated modules: Domain Analysis and Problem Mapping, LLM-PE Framework Development, IE and Analysis, and Validation of Extracted Information. Each module contributes to systematically translating qualitative accounts into structured insights, enabling the identification of perceived economic vulnerabilities and their underlying factors.

3.1. Module I: Domain Analysis and Problem Mapping

This initial module in the PE framework adopts the existing conceptual model named ‘AWESOME’ [4] to define and understand the key issues, questions, and contexts relevant to the study’s focus on perceived vulnerabilities among the individuals. Here, we identified operational gaps in this theoretical framework and mapped the associated information entities as ‘information elements’, then explored possible associations between them based on the underlying theoretical model. This module is grounded in the hypothesis that vulnerability, when decomposed into structured conceptual elements—such as categories, aspects, indicators, factors, and expressions—can be reliably mapped from narrative text through computational means. We assume that this decomposition, informed by the adapted AWESOME framework, provides a stable scaffold for both prompt design and semantic analysis, enabling operational alignment between domain theory and NLP outputs. This phase had set the foundation for the entire framework by defining the scope of the inquiry, pinpointing specific vulnerabilities to explore, and determining the boundaries of what will be analyzed. Further details on terminologies, notations, underlying knowledge, and expected outcomes of the vulnerability extraction process are elaborated in the subsections.

3.1.1. Domain Knowledge and Representation

To structure the IE task, five core elements were defined: Vulnerability Aspects (VAs), Vulnerability Factors (VFs), Indicators (VIs), Vulnerability Markers (VMs), and Vulnerability Expressions (VEs). Their operational definitions are summarized in Table 1, along with their notations used throughout the paper. While concepts including WE, WV, VC, VI, and VF were adopted from the original framework, concepts like VA, VM, and VE have been devised by the authors to operationalize the theoretical framework for actual vulnerability assessment.

3.1.2. Dataset Gathered

This study draws on narrative data collected from Olaikuda, a coastal village located in the Rameswaram region of Tamil Nadu, India, as part of an in-depth interview about life satisfaction. The community is predominantly dependent on marine-based livelihoods, including seaweed cultivation, small-scale fishing, and associated processing activities. The study focused on women engaged in these coastal livelihood practices, given their central roles in household economic sustainability and their exposure to fluctuations in resource availability, market conditions, and climatic variability.

Participants were recruited through community-based networks and local field facilitators familiar with village livelihood structures. A purposive sampling strategy was used to ensure representation among women actively engaged in marine-dependent work. Narratives were collected through in-depth semi-structured interviews, during which participants described livelihood experiences, stressors, and perceived vulnerabilities in their own words. The current analysis is based on 70 narrative extracts drawn from 25 participants and reflects the pilot phase of a larger ongoing data collection effort. Data expansion is in progress, with 62 interviews completed to date and over 400 additional narratives extracted for subsequent annotation and validation.

Figure 2 illustrates the outcome of a manual analysis and annotation process applied to a sample narrative input. The interview narrative illustrates the nuanced, multidimensional nature of the individuals’ vulnerability, underscoring the importance of a comprehensive analytical framework. Recent studies highlighted the significant improvements that PE brings to the contextual appropriateness and translation quality of large language model (LLM) outputs [47,48]. Building on this, our study adopted the PE approach to translate regional Indian languages into English using GPT-4. To avoid any systemic biases, bilingual reviewers corrected for any translational quality issues for the complete dataset.

3.1.3. Annotation Guidelines and Protocol

The annotation process focused on extracting five core information elements from the narrative data. Operational definitions and category boundaries were developed in consultation with domain experts and documented in an internal guideline, which included example-driven heuristics to support consistency in interpretation. Given the pilot nature of this study, annotations were initially performed by a single annotator and subsequently reviewed by a second domain-informed researcher for consistency and conceptual alignment. AI assistance was explicitly avoided during manual annotation to preserve human reasoning fidelity. To assess schema reliability, a subset of 25 narrative extracts was independently annotated by two annotators, resulting in a Cohen’s κ of 0.78, indicating substantial agreement. The full annotation of the 70 narratives used in this pilot was completed by a trained social science researcher, with conceptual review by a second domain expert to maintain alignment with the vulnerability schema and reduce individual interpretation bias. This iterative validation process ensured fidelity to domain theory and supported systematic documentation of ambiguous cases for future refinement of the schema. Multi-annotator extension to the full dataset is currently underway as part of the ongoing data expansion effort.

3.2. Module II: Information Extraction (IE) Framework Development

After understanding the problem space, the next phase is to develop an IE pipeline using LLM and PE techniques that would extract information as indicated previously. The hypothesis underpinning this module is that PE, particularly when iteratively refined through prompting strategies and structural formatting, can guide LLMs to generate information-rich outputs aligned with domain-specific schemas. We further posit that integrating multimodal context (e.g., visual knowledge graphs) can enhance the model’s ability to discern relationships and maintain output consistency across prompts.

Prompt refinement is an iterative process of improving the quality of prompts used to interact with AI models, particularly LLMs. The goal is to elicit the most accurate, relevant, and insightful responses from the model through rephrasing, constraint addition, example insertion, and structure tuning. The focus was on improving clarity, specificity, and conciseness [49]. To identify an optimal prompt template for extracting information elements from the input self-narratives, we iteratively tested various prompting techniques based on the analytical requirements of IE.

3.2.1. Model Configuration and Context Management

The in-depth interviews consisted of open-ended questions exploring multiple aspects of life satisfaction. Each response block ranged between 100 and 250 words, with only a few outliers extending to approximately 350 words. These narratives were combined with schema definitions, task instructions, and few-shot exemplars to construct structured prompts for schema-guided extraction.

All composite prompts were processed using the GPT-4 (32K context) configuration and remained well within its context limit. The average total prompt length across runs was approximately 6000–7000 tokens, distributed as follows:

Base instructions and schema context: ∼3000–3500 tokens
Few-shot exemplars (2–3): ∼1000–1500 tokens
Constraints and formatting rules: ∼400–600 tokens
Knowledge graph cues (in specific refinement iterations): ∼500–1000 tokens
Narrative input: ∼150–350 tokens

To maintain interpretive consistency and reduce randomness, the following model hyperparameters were applied during all runs. To maintain interpretive consistency and reduce random variation in responses, the model was operated with a temperature of 0.25 and a top-p value of 0.9, which together balanced determinism and lexical diversity. The maximum token length was set to 1500 to accommodate multi-aspect JSON outputs. The presence penalty remained at 0.0 to preserve adherence to the schema and prevent unnecessary deviations. This structured configuration was implemented to ensure that narratives were processed within a controlled, semantically coherent context, enabling reliable schema-aligned reasoning without exceeding computational limits.

3.2.2. Prompt Refinement and Context Augmentation

Prompt refinement was conducted as an iterative experimental process to progressively align LLM outputs with the domain schema. Refinement strategies included constraint prompting, few-shot examples, JSON-structured output formatting, and role-based conditioning. Representative templates are demonstrated further within the Supplementary Material illustrating how structural constraints were layered to improve consistency and reduce false positives.

The prompt refinement process combined error-driven adjustments with targeted exploratory variations, including alternative constraint framing and augmentation-based prompting. Performance did not increase monotonically across iterations; instead, each version helped clarify trade-offs between precision, recall, and discovery. Thus, progression was guided by both observed extraction behavior and hypothesis-driven modifications to improve schema alignment and interpretability (Supplementary Document, Table S1). All prompt iterations were evaluated on the same set of narratives to ensure that performance differences were attributable to prompt variation rather than dataset changes.

Across seven iterations (from P-Basic to P-AUG-FS-CNST-COT), we systematically adjusted prompting strategies through output-based evaluation. Each iteration was guided by structured annotation requirements and assessed using discovery and precision metrics. The final prompt structure consisted of (i) a role-conditioning statement, (ii) schema-aligned output specifications, (iii) context information, (iv) constraint statements, and (v) two few-shot exemplars demonstrating expected reasoning behavior. This structure was held constant across narratives, with only the input text replaced.

Drawing inspiration from context augmentation techniques in machine learning, and acknowledging the recent exploration by Xu et al. [50] into the use of image inputs for prompt-based evaluation, we adapted and extended this concept for prompt engineering. Specifically, we proposed the use of the visual knowledge graph in the form of a semantic network as augmented context within the prompt. The objective was to evaluate the model’s ability to identify instances of these depicted relationships within the input text, thereby assessing the effectiveness of multimodal context augmentation for this task.

3.2.3. Evaluation Metrics

Evaluation of IE has been performed at two levels. First is the validation of PE, and secondly, the validation of IE.

Prompt Refinement Validation:

To evaluate the performance of each prompt revision, we employed standard IE metrics computed against the manually validated reference annotations:

True Positives (TP): Correctly extracted information elements that matched the reference annotations.
False Positives (FP): Extracted elements not supported by the reference annotations.
Novel Insights (NI): Additional relevant information not captured in the manual annotations but considered contextually valid.
Discovery Rate (DR): The proportion of novel insights relative to all identified elements, calculated as follows:

$D R = \frac{N I}{T P + N I} \times 100 %$

(1)

This metric provides an estimate of how effectively a prompt uncovers new, contextually meaningful insights beyond the predefined schema.

To assess the quality and completeness of the extracted information without requiring full gold annotation of the entire corpus, we conducted a bounded evaluation using the IAA-validated subset (n = 25 narratives), which served as an expert-verified reference set. For this subset, we derived standard diagnostic indicators (precision, recall, and F1-score) to characterize extraction accuracy and completeness. These measures are reported only for the bounded subset, as the full corpus does not yet have exhaustive gold annotation.

Information Extraction Validation:

Ground truth for the IE validation was derived from the gold dataset. To evaluate the classification ability of the conversational LLMs, their predicted labels were compared against manually annotated labels. Performance metrics such as precision, recall, F1-score, and accuracy were computed to assess the level of agreement.

3.3. Module III: Interpretation and Insight Derivation Pipeline

After completing prompt refinement and validation, we developed a modular workflow to support scalable vulnerability analysis with human-in-the-loop oversight. As shown in Figure 3, the pipeline is structured into three stages: (i) system design, which includes prompt engineering and LLM-based IE (blue), (ii) human validation and formatting (grey), and (iii) data analysis, including clustering and visualization (green and yellow). This separation enhances transparency, debuggability, and reusability across datasets. The workflow enables structured extraction of vulnerability elements (VA, VF, VI, VE) and their representation through linguistic and graph-based methods.

3.3.1. Categorization of Vulnerability Aspects and Contributing Factors

According to the information elements and their relationships (as represented in Figure S9 in the Supplementary Document), VA is associated with sub-dimensions of the AWESOME framework. However, these sub-dimensions were not pre-specified in the framework. Therefore, grouping was performed using an unsupervised approach, in which the LLM was allowed to cluster the aspects based on semantic and contextual similarity. After manual formatting and verification of the extracted information elements, the results documented in an Excel sheet were provided as input to the prompt listed below.

Prompt for categorization of VA: “Given the IE results as the shared Excel sheet, from the ’aspects_list’ column, identify the semantically similar aspects and group them along with their occurrences counted as frequency.”

Unlike the unsupervised method used to categorize VA, which allowed for emergent groupings based on semantic similarity when predefined categories were unavailable, the analysis of VFs followed a different approach. To examine how influences from various life domains contribute to specific vulnerabilities, we utilized the six predefined dimensions of the AWESOME framework as classification categories for these factors. The VF components identified in participant narratives were organized according to these dimensions, which represent key domains through which individuals perceive and articulate the sources of vulnerability. This approach enables the capture of interconnected influences, allowing a more structured examination of both structural and individual contributors to vulnerability.

As described in Section 3.1.1, VFs reflect the underlying conditions, circumstances, or drivers that increase an individual’s or group’s susceptibility to experiencing disadvantage or reduced well-being. This research does not treat these factors as causal determinants but rather as perceived influencers associated with specific VAs. The categorized results are presented and discussed in Section 4.3.1.

3.3.2. Attribution of Influences to Specific Vulnerability Aspects

In this phase, we analyzed the derived information elements to uncover patterns of vulnerability expressed through their relational structure. The analytical findings reported in Section 4.3.3 are based on a graph-based in-degree and out-degree analysis, which reveals the relative strength and directionality of influences across different dimensions of vulnerability. The procedure comprised three stages:

Preparation of Edge and Node Tables (Python-based): Structured edge and node tables were generated using Python 3.13.7 to represent relationships (edges) and entities (nodes) within the vulnerability schema. These tables formed the foundational data structure for subsequent graph construction.
Conversion to GML Format (Prompt Engineering-Based): A prompt-engineering method was employed to transform the tabulated data into GML (Graph Modeling Language) format, enabling representation suitable for graph-based visualization and analysis.
Graph Visualization (Gephi): The GML files were imported into Gephi to generate network visualizations of the derived relationships. This facilitated the observation and interpretation of directional influence patterns between vulnerability aspects and contributing factors.

3.3.3. Extraction of Linguistic Markers Expressing Vulnerability

This stage involved analyzing the textual narratives to identify linguistic indicators corresponding to VA, along with narrator-specific expressions represented as VM. Frequency analysis of unigrams, bigrams, and longer n-grams was employed to surface prevalent terms and phrases associated with vulnerability expressions. This approach enabled the extraction of linguistic patterns that reflect both commonly shared indicators and individualized experiences of vulnerability.

4. Results and Discussion

4.1. Bounded IE Quality Evaluation

A bounded evaluation was conducted on the IAA-validated subset (25 narratives) to assess extraction quality on a stable reference sample. The model showed high precision and moderate recall, with a precision of 0.77, recall of 0.72, and F1 = 0.74 (Table 2). This indicates that the extracted information is generally accurate, though some contextually implied indicators may be missed. We do not claim full recall estimation due to the pilot-scale size and evolving annotation schema; instead, the report bounded evaluations are aligned with qualitative research standards.

4.2. IE Models and Performance Comparison

Table 3 presents the aggregated results of the prompt refinement strategies. The comparison highlights clear differences in extraction precision, discovery rate, and error characteristics across configurations. Accuracy is not reported for intermediate revisions because the task remains open-ended: the complete set of relevant vulnerability elements is not known a priori; therefore, False Negatives (FNs) cannot be exhaustively determined. FNs and bounded accuracy are computed only for the final optimized prompt configuration, where a fully re-annotated subset establishes a complete ground truth.

Maintaining the extraction task as open-bounded during real-time or scaled implementation is essential, as it enables the model to surface novel or previously unrecognized vulnerability elements. Constraining the system to a fixed, exhaustively enumerated label set would convert the problem into closed-world classification and suppress the discovery of emergent patterns. Thus, the pipeline preserves openness for analytical insight and interpretive depth, while bounded evaluation is applied selectively to assess reliability where the ground truth can be fully enumerated.

Best-performing strategies. The P-AUG-FS-CNST configuration achieved the strongest performance overall, with the highest TP (265) and lowest FP (58), ranking first in precedence. The P-Out-CNST-FS variant followed (TP = 248, FP = 67), ranking second and indicating that constraint-guided output structuring stabilized extraction behavior.

Exploratory strategies. The P-Out configuration surfaced the most NI (92) and achieved the highest DR (35.77%), but with reduced precision (TP = 152, FP = 118). P-COT-FS showed similar behavior, with NI = 76 and DR = 27.06%, accompanied by increased FP (109). These results indicate that output-oriented and chain-of-thought prompting increased exploratory breadth but reduced stability.

Intermediate strategies. P-AUG-FS-CNST-COT combined augmentation with chain-of-thought reasoning, yielding moderate but less stable performance (TP = 209, FP = 101, NI = 76, DR = 24.79%). Two instances of visual ambiguity were observed, where line structures and labels in the visual graph were misinterpreted. P-Out-FS performed weakest (TP = 198, NI = 50, DR = 19.94%), ranking last.

Overall trends. The totals (TP = 1279; FP = 554; NI = 445) produced an average DR of 25.70%. The results distinguish: (i) precision-focused strategies (P-AUG-FS-CNST, P-Out-CNST-FS), and (ii) exploratory strategies (P-Out, P-COT-FS). Hybrid configurations introduced additional variability where structural and textual cues were misaligned.

4.3. Interpretative Results and Vulnerability Insight Discussion

After completing prompt refinement, validation, and template finalization, the analysis was scaled to the full set of seventy narratives. Visualization served as a tool for exploratory interpretation and verification of the extracted elements. Matplotlib 3.10.8 was used to generate trend and frequency plots, and the Orange data mining toolkit supported unsupervised clustering and visual pattern inspection. For graph-based representations aligned with the AWESOME framework, extracted entities were structured in CSV and GML formats, enabling domain-aware layouts in Gephi. While the framework does not include a dedicated visualization interface, these tools facilitated transparent examination of relational patterns and supported manual validation where needed.

4.3.1. Identified VA and VF: Categorization Results

The distributional patterns of VA, VF, and VI formed the basis for examining how these elements interacted across narratives. The analysis assessed whether the extracted outputs reflected coherent category structures under two conditions: (i) supervised mapping aligned with manual annotations, and (ii) unsupervised clustering (t-SNE/tmap) without reference to labels. Together, these perspectives indicate how reliably the pipeline distinguished between core categories and how much latent structure could be inferred directly from narrative data.

VA and Categories: Unsupervised approach

The t-SNE visualization in Figure 4a shows a scatter plot in which each bubble represents a VA instance, color-coded by theme. Spatial proximity reflects semantic similarity: bubbles positioned closer together represent aspects that were interpreted similarly by the model. Within each theme, the clusters appeared compact, indicating that the extracted VA were semantically consistent. Instances positioned at the periphery represent less frequent or context-specific vulnerabilities.

The density of each theme corresponds to its frequency across narratives. For example, housing and living security (yellow) appeared rarely and therefore formed a sparse cluster, whereas income opportunities and employment challenges formed a denser cluster, appearing nine times across the dataset. These observations suggest that the model captured both recurring and less common vulnerability themes in ways that align with meaningful semantic distinctions.

Categories of VF: Supervised approach

The VFs were classified using the six dimensions of the AWESOME framework, which represent domains through which potential influences on vulnerability are expressed. The factors identified from the narratives were mapped to these dimensions to capture overlapping structural and individual conditions. The goal is not to infer causality but to reflect perceived influences associated with specific VAs.

The clustered representation in Figure 4b shows proximate points where VFs co-occurred frequently across narratives. For example, “Lack of stable jobs” and “Dependency on family for financial support” clustered under economic challenges, reflecting instability and reliance-based coping. “Health-related financial pressure” and “High healthcare costs” grouped under health and well-being, indicating the financial strain associated with medical needs.

The proximity between “Economic stagnation” and “Dependency on loans for financial management” suggests a link between limited income diversification and debt-driven coping strategies. Similarly, “Job insecurity” and “Lack of job opportunities” clustered under employment challenges, indicating employment stability as a prominent community concern. Environmental and seasonal factors, such as “Unpredictable sea conditions” and “Reduced fish availability during off-season”, formed a distinct cluster, reflecting dependence on natural resource patterns for livelihood.

4.3.2. Linguistic Markers of Economic Vulnerability

Word-cloud analysis was used to capture prominent linguistic markers within the narratives. The composite visualization showed frequent terms such as “loan”, “work”, and “interest”, corresponding to economic VAs, and terms such as “hospital” and “treatment” indicating health-related VFs. These recurring markers illustrate how financial strain and healthcare pressures were articulated in everyday language.

Frequent co-occurrence of VIs (e.g., “loan”, “hospital”) with VEs (e.g., “struggling”, “unable”) illustrated how subjective phrasing reinforced measurable indicators in the narratives. The unigram, bigram, and 1–4 g word-cloud visualizations highlighted both high-frequency indicators and the contextual phrasing through which vulnerability was expressed. For example, the unigram children corresponded to private school and pay fees in the bigram view, which further extended to private school tutors and private school term fees in the 4-g representation. This progression reflects how specific financial pressures are embedded within everyday descriptions of household responsibility.

4.3.3. Perceived Influences on Vulnerability Aspects

This subsection examines the derived information elements to interpret the patterns of vulnerability as expressed through their relational structure.

Overall pattern of VA–VF interplay. Figure 5 visualizes the interconnections between VA and VF based on the categorized outputs. In Figure 5a, green nodes represent VA categories, and red nodes represent VFs. Directed links from VF to VA indicate the perceived influences contributing to economic vulnerability. Figure 5b highlights VA node sizes proportional to in-degree. Higher in-degree reflects a greater number of associated VFs. The VA category Income Opportunities and Employment Challenges had the highest in-degree, followed by Income Instability and Insecurity (in-degree = 12) and Financial Support and Dependency (in-degree = 11). Miscellaneous and Specific Hardships captured context-specific challenges such as those associated with seaweed cultivation.

Figure 6 shows the VF-VA interactions for Income Opportunities and Employment Challenges (green node). VFs (pink nodes) were extracted via .gml and visualized in Gephi. The VA node size reflects its relative salience in the dataset, while connected VFs show common co-occurring influences. Patterned VFs include unstable marine livelihood, limited income diversification, dependence on a spouse’s earnings, scarcity of work, and lack of bargaining advantage. Notably, the VF “Pursuit of higher earning potential through skilled training” captures income disruption during training. Some VF wording retains narrator-expressed framing, indicating model retention of original phrasing.

4.4. Comparison with Manual Annotation-Based Insights

To validate the reliability of the extracted insights, the results were compared with manual annotations of VA and VF across the six dimensions of the AWESOME framework. The task was posed as a multi-class classification problem. The confusion matrix in Figure 7 illustrates alignment patterns, and Table 4 reports the corresponding performance metrics. Overall accuracy was 77%, with a weighted average F1-score of 0.78. This indicates strong alignment in dominant categories while revealing challenges in dimensions characterized by overlapping or sparse cues.

High-performing categories. Economic Vitality (F1 ≈ 0.90) and Health (F1 ≈ 0.93) achieved the strongest agreement. These dimensions contained explicit and frequent lexical cues (e.g., “loan”, “work”, “hospital”) that consistently mapped to the correct VAs and VFs. Statements concerning hospital expenses, for example, were reliably identified as health-related VFs.

Moderately performing categories. Education and Skill Development showed partial alignment (F1 ≈ 0.55). Misclassifications occurred when narratives describing training or upskilling activities co-occurred with economic references. For instance, tailoring workshop references were sometimes classified under Economic Vitality rather than Education. Classification tended to be influenced by dominant financial lexical cues, leading to cross-dimension assignment.

Challenging categories. Environmental Quality and Safety and Security exhibited lower precision and recall (F1 < 0.30). These categories were infrequently represented and often expressed indirectly, leading to overgeneralization. For example, unsafe transport was mapped to Economic Vitality, while climate-related hardship was absorbed into Health. Such cases illustrate the difficulty of identifying underrepresented domains from limited cues.

Key insights. The pipeline performs reliably in dimensions with strong lexical grounding and adequate representation, but it struggles in domains where boundaries are diffuse or data density is low. This highlights the need for human-in-the-loop validation and suggests that future extensions should incorporate multi-annotator sampling to strengthen robustness in underrepresented categories.

4.5. Summary of Findings

To synthesize the outcomes of the four analytical modules, we briefly summarize how the core findings relate to the research hypothesis introduced in the introduction. This also serves to highlight how each methodological component contributes toward validating the feasibility, interpretability, and replicability of the proposed prompt-based framework for extracting multidimensional vulnerability markers from self-narratives.

Domain modeling using the AWESOME framework successfully structured vulnerability-related elements into four interpretable categories (Aspects, Indicators, Factors, Markers), enabling alignment between theoretical concepts and prompt design.
Prompt engineering iterations led to improved output consistency and richness of extracted elements. Version 6 (V6), incorporating context embedding and structural cues, showed the highest consistency in reproducing multi-dimensional markers from narratives.
Validation metrics across 70 narratives demonstrated moderate-to-high alignment with human-labeled content, with observed improvements in Discovery Rate (DR) and reduction in hallucinated outputs in later prompt versions.
Visualization of extracted elements through t-SNE, hierarchical clustering, and knowledge graphs revealed meaningful groupings of vulnerability profiles, suggesting the framework’s potential for pattern discovery in qualitative data.
Case-level interpretability was preserved through structured JSON outputs, maintaining contextual granularity while enabling downstream analysis.
Overall, the pilot confirms the working hypothesis that structured prompting guided by domain theory can yield scalable, interpretable, and replicable insights from self-narratives—laying the foundation for future rigorous applications.

5. Limitations and Error Analysis

While this pilot study demonstrates the potential feasibility of the proposed framework, its generalizability has yet to be evaluated using a larger and more diverse corpus. Several limitations must therefore be acknowledged. For clarity and transparency, we organize these limitations into four categories: (i) annotation-related, (ii) model-related, (iii) data-related, and (iv) evaluation-related. Table 5 summarizes the key issues, their implications, and planned strategies for mitigation.

These limitations highlight the practical challenges of applying LLMs to qualitative narrative data. The pilot design necessarily constrained annotation protocols and dataset size, while advanced prompt strategies introduced new error modes. By documenting these challenges, along with mitigation strategies, this study maintains transparency and provides a roadmap for future refinement.

6. Conclusions and Future Works

This study demonstrates that LLMs, when combined with systematic PE, can support the structured extraction of complex vulnerability markers from personal narratives—a methodological gap in computational social science. By operationalizing the AWESOME framework through theory-informed prompt design and human-in-the-loop validation, the proposed approach effectively captures the implicit, overlapping expressions of economic vulnerability often overlooked by conventional IE methods. Validation through both consistency metrics and qualitative expert review confirms the framework’s ability to balance analytical precision with contextual sensitivity. The findings provide empirical grounding for the theoretical view that vulnerability is multi-layered and contextually embedded, highlighting the limitations of linear, purely data-driven interpretations in human-centered domains. By aligning advanced NLP techniques with domain-specific social theory, this study offers a replicable methodology that is both interpretable and scalable.

This study provides feasibility evidence for the proposed workflow; broader validation will be undertaken in the next stage through expanded sampling, multi-annotator agreement assessment, and held-out evaluation. Future work will also focus on strengthening the contextual grounding of model outputs, enhancing generalizability across diverse settings, and exploring deeper forms of reasoning such as temporal and causal inference. These directions would expand both the methodological scalability and the practical utility of the framework, particularly for policy-aligned applications. As NLP technologies continue to evolve, adapting them in ways that are sensitive to social context and grounded in domain expertise remains a key challenge—and opportunity—for applied AI research in the social sciences. Although the present analysis is grounded in a specific coastal livelihood context, the methodology is generalizable to other domains where vulnerability is embedded in narrative expression. With appropriate schema adaptation, the same pipeline could be adopted to support studies in domains like humanitarian relief, migration stress, gendered labor precarity, and community-based program planning.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/bdcc10010006/s1, Figure S1: Prompts refinement workflow for the task of vulnerability information extraction (IE); Figure S2: This image demonstrates the extraction of vulnerability information from a sample text; Table S1: Details about the revisions adopted and its impact with each iterations of prompting; Figure S3: The first two iterations of prompt templates started from a very basic structure; Figure S4: Third revision of prompt template; Figure S5: Fourth revision of prompt template; Figure S6: Fifth revision of prompt template; Figure S7: Sixth revision of prompt; Figure S8: Seventh revision of prompt template; Figure S9: Graphical representation of conceptual domain knowledge on vulnerability assessment. This framework extends the theoretical model by Christie Gressel et al. [4], refining it for the operationalization of a qualitative, data-driven analytical approach. Yellow-highlighted elements represent the original theoretical foundations, while blue-highlighted elements denote operational extensions. The blue connections illustrate key relationships in understanding and addressing individuals’ vulnerability.

Author Contributions

Conceptualization, A.P.; methodology, A.P. and V.G.; data curation, A.N. and A.P.; validation, A.P.; investigation, A.P.; writing—original draft preparation A.P.; visualization, A.P.; supervision, V.G. and T.R. All authors have read and agreed to the submitted version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the institutional research protocols, and the protocol was approved by the Institutional Review Board (or Ethics Committee) of Amrita Vishwa Vidyapeetham (028—IEC_SSBS_AL_CWEGE-AVV₀28) on 18 March 2023.

Informed Consent Statement

This study received ethics clearance from the Institutional Ethics Committee of Amrita Vishwa Vidyapeetham prior to the commencement of data collection. All interview participants were informed about the purpose of the study, assured of the voluntary nature of participation, and provided informed consent before the interviews were conducted. Personal identifiers were removed during transcription, and all data was anonymized to ensure participant confidentiality.

Data Availability Statement

The data collected from rural women contain sensitive vulnerability information. The data are not publicly available due to privacy and ethical restrictions but are available from the corresponding author upon reasonable request.

Acknowledgments

The authors express their sincere gratitude to the community facilitators and participants from the Palk Bay region for generously sharing their narratives, which were central to this study. We are also thankful to our colleagues who supported the data collection drive and field coordination efforts, enabling smooth engagement with participants. Appreciation is extended to domain experts in social science and natural language processing who provided critical insights into our analytical framework. This study was conducted without external funding. The authors acknowledge the use of ChatGPT-4o and Gemini2.5 as assistive tools for grammar editing, clarity enhancement, structural organization, and proofreading of this manuscript. All AI-generated content underwent thorough human review to preserve the authenticity of the authors’ original contributions and research integrity.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, H.; Wang, W. Knowledge domain and emerging trends of social vulnerability research: A bibliometric analysis (1991–2021). Int. J. Environ. Res. Public Health 2022, 19, 8342. [Google Scholar] [CrossRef]
Numans, W.; Van Regenmortel, T.; Schalk, R.; Boog, J. Vulnerable persons in society: An insider’s perspective. Int. J. Qual. Stud. Health-Well-Being 2021, 16, 1863598. [Google Scholar] [CrossRef] [PubMed]
Zimmermann, B. Social vulnerability as an analytical lens for welfare state research: Concepts and typologies. Policy Soc. 2017, 36, 497–515. [Google Scholar]
Gressel, C.M.; Rashed, T.; Maciuika, L.A.; Sheshadri, S.; Coley, C.; Kongeseri, S.; Bhavani, R.R. Vulnerability mapping: A conceptual framework towards a context-based approach to women’s empowerment. World Dev. Perspect. 2020, 20, 100245. [Google Scholar] [PubMed]
Massmann, F.; Wehrhahn, R. Qualitative social vulnerability assessments to natural hazards: Examples from coastal Thailand. J. Integr. Coast. Zone Manag. 2014, 14, 3–13. [Google Scholar] [CrossRef]
UNDP International Center for Private Sector in Development’s (ICPSD) SDG AI Lab. Digital Social Vulnerability Index Technical Whitepaper. 2024. Available online: https://www.undp.org/publications/digital-social-vulnerability-index-technical-whitepaper (accessed on 20 February 2024).
Kim, K.; Kang, J.Y.; Hwang, C. Identifying Indicators Contributing to the Social Vulnerability Index via a Scoping Review. Land 2025, 14, 263. [Google Scholar] [CrossRef]
Cutter, S.L. The origin and diffusion of the social vulnerability index (SoVI). Int. J. Disaster Risk Reduct. 2024, 109, 104576. [Google Scholar] [CrossRef]
Frenda, S.; Abercrombie, G.; Basile, V.; Pedrani, A.; Panizzon, R.; Cignarella, A.T.; Marco, C.; Bernardi, D. Perspectivist approaches to natural language processing: A survey. Lang. Resour. Eval. 2024, 59, 1719–1746. [Google Scholar] [CrossRef]
Malgaroli, M.; Hull, T.D.; Zech, J.M.; Althoff, T. Natural language processing for mental health interventions: A systematic review and research framework. Transl. Psychiatry 2023, 13, 309. [Google Scholar] [CrossRef]
Montejo-Ráez, A.; Molina-González, M.D.; Jiménez-Zafra, S.M.; García-Cumbreras, M.Á.; García-López, L.J. A survey on detecting mental disorders with natural language processing: Literature review, trends and challenges. Comput. Sci. Rev. 2024, 53, 100654. [Google Scholar] [CrossRef]
Keith, B.; German, F.; Krokos, E.; Joseph, S.; North, C. Explainable AI Components for Narrative Map Extraction. arXiv 2025, arXiv:2503.16554. [Google Scholar] [CrossRef]
German, F.; Keith, B.; North, C. Narrative Trails: A Method for Coherent Storyline Extraction via Maximum Capacity Path Optimization. arXiv 2025, arXiv:2503.15681. [Google Scholar] [CrossRef]
Jayaram, K.; Sangeeta, K. A review: Information extraction techniques from research papers. In Proceedings of the 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Chennai, India, 21–23 February 2017; pp. 56–59. [Google Scholar]
Martinez-Rodriguez, J.L.; Hogan, A.; Lopez-Arevalo, I. Information extraction meets the semantic web: A survey. Semant. Web 2020, 11, 255–335. [Google Scholar] [CrossRef]
van Donge, W.; Bharosa, N.; Janssen, M. Data-driven government: Cross-case comparison of data stewardship in data ecosystems. Gov. Inf. Q. 2022, 39, 101642. [Google Scholar] [CrossRef]
Vatsal, S.; Dubey, H. A survey of prompt engineering methods in large language models for different nlp tasks. arXiv 2024, arXiv:2407.12994. [Google Scholar] [CrossRef]
Ranade, P.; Dey, S.; Joshi, A.; Finin, T. Computational understanding of narratives: A survey. IEEE Access 2022, 10, 101575–101594. [Google Scholar] [CrossRef]
Piper, A.; So, R.J.; Bamman, D. Narrative theory for computational narrative understanding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 298–311. [Google Scholar]
Nasheeda, A.; Abdullah, H.B.; Krauss, S.E.; Ahmed, N.B. Transforming transcripts into stories: A multimethod approach to narrative analysis. Int. J. Qual. Methods 2019, 18, 1609406919856797. [Google Scholar] [CrossRef]
Rolón-Dow, R.; Bailey, M.J. Insights on narrative analysis from a study of racial microaggressions and microaffirmations. Am. J. Qual. Res. 2021, 6, 1–18. [Google Scholar] [CrossRef]
Wong, G.; Breheny, M. Narrative analysis in health psychology: A guide for analysis. Health Psychol. Behav. Med. 2018, 6, 245–261. [Google Scholar] [PubMed]
Birkbeck, G.; Nagle, T.; Sammon, D. Challenges in research data management practices: A literature analysis. J. Decis. Syst. 2022, 31, 153–167. [Google Scholar] [CrossRef]
Bagchi, A. Challenges in Data Collection and Management for NGOs and How to Overcome Them. Vakilkaro. Available online: https://www.vakilkaro.com/blogs/challenges-in-data-collection-and-management-for-ngos/ (accessed on 28 April 2025).
Divyadarshini, S. Data-Driven Decision Making for Skilling NGOs: How MIS Can Transform Impact Measurement, EdZola. Available online: https://www.edzola.com/post/data-driven-decision-making-for-skilling-ngos-how-mis-can-transform-impact-measurement (accessed on 21 February 2024).
Santana, B.; Campos, R.; Amorim, E.; Jorge, A.; Silvano, P.; Nunes, S. A survey on narrative extraction from textual data. Artif. Intell. Rev. 2023, 56, 8393–8435. [Google Scholar] [CrossRef]
Nedungadi, P.; Lathabai, H.H.; Raman, R. Large Language Models in Biomedicine and Health: A Holistic Evaluation of the Effectiveness, Reliability and Ethics using Altmetrics. J. Scientometr. Res. 2025, 14, 46–61. [Google Scholar] [CrossRef]
Amirova, A.; Fteropoulli, T.; Ahmed, N.; Cowie, M.R.; Leibo, J.Z. Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity. PLoS ONE 2024, 19, e0300024. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
Xu, T.; Wu, S.; Diao, S.; Liu, X.; Wang, X.; Chen, Y.; Gao, J. Sayself: Teaching llms to express confidence with self-reflective rationales. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 5985–5999. [Google Scholar]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
Lynch, C.J.; Jensen, E.; Munro, M.H.; Zamponi, V.; Martinez, J.; O’Brien, K.; Feldhaus, B.; Smith, K.; Reinhold, A.M.; Gore, R. GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study. arXiv 2024, arXiv:2402.05435. [Google Scholar] [CrossRef]
Matus, M.; Urrutia, D.; Meneses, C.; Keith, B. ROGER: Extracting Narratives Using Large Language Models from Robert Gerstmann’s Historical Photo Archive of the Sacambaya Expedition in 1928. In Proceedings of the Text2Story@ ECIR, Glasgow, Scotland, 24 March 2024; pp. 53–64. [Google Scholar]
Gopal, L.S.; Prabha, R.; Ramesh, M.V. Developing information extraction system for disaster impact factor retrieval from Web News Data. In Information and Communication Technology for Competitive Strategies (ICTCS 2021) Intelligent Strategies for ICT; Springer Nature: Singapore, 2022; pp. 357–365. [Google Scholar]
Elfes, J. Mapping News Narratives Using LLMs and Narrative-Structured Text Embeddings. arXiv 2025, arXiv:2409.06540. [Google Scholar]
Su, H.T.; Hsu, Y.C.; Lin, X.; Shi, X.Q.; Niu, Y.; Hsu, H.Y.; Lee, H.Y.; Hsu, W. Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, Miami, FL, USA, 12–16 November 2024; pp. 14839–14854. [Google Scholar]
Ziems, C.; Held, W.; Shaikh, O.; Chen, J.; Zhang, Z.; Yang, D. Can large language models transform computational social science? Comput. Linguist. 2024, 50, 237–291. [Google Scholar] [CrossRef]
Bail, C.A. Can Generative AI improve social science? Proc. Natl. Acad. Sci. USA 2024, 121, e2314021121. [Google Scholar] [CrossRef]
Lim, S.; Schmälzle, R. Artificial intelligence for health message generation: An empirical study using a large language model (LLM) and prompt engineering. Front. Commun. 2023, 8, 1129082. [Google Scholar] [CrossRef]
Zhu, Y.; Yin, Z.; Tyson, G.; Haq, E.U.; Lee, L.H.; Hui, P. Apt-pipe: A prompt-tuning tool for social data annotation using chatgpt. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 245–255. [Google Scholar]
Yang, Z.; Li, L.; Wang, J.; Lin, K.; Azarnasab, E.; Ahmed, F.; Liu, Z.; Liu, C.; Zeng, M.; Wang, L. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv 2023, arXiv:2303.11381. [Google Scholar]
Ling, C.; Zhao, X.; Lu, J.; Deng, C.; Zheng, C.; Wang, J.; Chowdhury, T.; Li, Y.; Cui, H.; Zhang, X.; et al. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey. arXiv 2024, arXiv:2305.18703. [Google Scholar] [CrossRef]
Naveen, J.R.; Ganesh, H.B.; Kumar, M.A.; Soman, K.P. Distributed Representation of Healthcare Text Through Qualitative and Quantitative Analysis. In Computer Aided Intervention and Diagnostics in Clinical and Medical Images; Springer: Singapore, 2019; pp. 227–237. [Google Scholar]
Bartal, A.; Jagodnik, K.M.; Chan, S.J.; Babu, M.S.; Dekel, S. Identifying women with postdelivery posttraumatic stress disorder using natural language processing of personal childbirth narratives. Am. J. Obstet. Gynecol. MFM 2023, 5, 100834. [Google Scholar] [CrossRef]
Calderon, N.; Reichart, R.; Dror, R. The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs. arXiv 2025, arXiv:2501.10970. [Google Scholar]
Li, D.; Jiang, B.; Huang, L.; Beigi, A.; Zhao, C.; Tan, Z.; Bhattacharjee, A.; Jiang, Y.; Chen, C.; Wu, T.; et al. From generation to judgment: Opportunities and challenges of llm-as-a-judge. arXiv 2024, arXiv:2411.16594. [Google Scholar] [CrossRef]
Wang, L.; Lyu, C.; Ji, T.; Zhang, Z.; Yu, D.; Shi, S.; Tu, Z. Document-Level Machine Translation with Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, ACL Anthology, Singapore, 6–10 December 2023; pp. 16646–16661. [Google Scholar]
Yamada, M. Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT’s Customizability. In Proceedings of the Machine Translation Summit XIX, Vol. 2: Users Track, Macau, China, 4–8 September 2023; Asia-Pacific Association for Machine Translation. pp. 195–204. [Google Scholar]
Muktadir, G.M. A brief history of prompt: Leveraging language models. (through advanced prompting). arXiv 2023, arXiv:2310.04438. [Google Scholar] [CrossRef]
Xu, S.; Zhao, K.; Loney, J.; Li, Z.; Visentin, A. Zero-Shot Image-Based Large Language Model Approach to Road Pavement Monitoring. arXiv 2025, arXiv:2504.06785. [Google Scholar]

Figure 1. Illustration of the modular framework for extracting and analyzing vulnerability markers from self-narratives using Large Language Models (LLMs) and Prompt Engineering (PE) techniques.

Figure 2. This image demonstrates the extraction of vulnerability information from a sample text.

Figure 3. Interpretation pipeline illustration: Proposed analytical workflow for mapping economic vulnerabilities through prompt-engineered NLP techniques. The information pipeline where each module successively extracts different components has been represented as F-factor influencers and I-indicators of vulnerability.

Figure 4. T-SNE Visualization to illustrate the categorization of Vulnerability Aspects (VA) and Vulnerability Factors (VF).

Figure 5. Network graphs, generated using GPT-4 and Gephi, illustrate the interconnected themes of economic vulnerability and their perceived causes as reported by respondents.

Figure 6. Sample Visualization of VF-VA interactions for the dominant economic vulnerability: Income opportunities/employment challenges.

Figure 7. Vulnerability Aspect (VA) classification results aligned to AWESOME concepts, depicted as a confusion matrix.

Table 1. Domain-specific terminologies adopted from the AWESOME framework and further extended to accommodate qualitative, individual-level vulnerability studies.

Concept	Description	Notation
Women’s Empowerment	The process of increasing women’s choices and capacity to make discerning decisions towards sustainability and resilience (adopted from AWESOME framework [4]).	WE
Women’s Vulnerability	A state of women’s empowerment at a given point in time, determined as the net product of interactions between multiple factors, constraints, and intervention impacts shaping women’s choices and capacity to make discerning decisions across all the domains and contexts of women’s empowerment (Adopted from AWESOME framework [4])	WV
Vulnerability Category	A broad grouping of vulnerabilities under a particular dimension (e.g., Economic Vulnerabilities) that is being considered at any given point in time of the qualitative analysis.	VC
Vulnerability Aspects	Thematic subdivisions or focus areas within a VC that capture specific types of vulnerability experiences (e.g., job insecurity, income instability, financial dependence). Or, in other words, an individual’s vulnerabilities manifest through these aspects.	VA
Vulnerability Indicator	A measurable sign or metric used to assess and identify the level of vulnerability an individual or group may experience. It provides direct evidence of vulnerability, making them important for identifying current or potential problems in a more factual way.	VI
Vulnerability Factor	The conditions or circumstances that influence an individual’s or group’s risk of harm, disadvantage, or reduced well-being.	VF
Vulnerability marker	Markers are specific phrases or words that give insight into the underlying emotional or psychological states, helping to better understand how vulnerability is experienced on a personal level. Often, it is less direct.	VM
Vulnerability Expressions	Linguistic expressions of vulnerability are how people use language to express their vulnerable emotions or states.	VE

Table 2. Bounded recall evaluation on the IAA-validated subset (N = 25 narratives).

Metric	Mean (μ)	Std. Dev. (σ)
True Positives (TP)	88.6	2.70
False Positives (FP)	26.6	2.41
False Negatives (FN)	34.2	4.66
Precision	0.769	0.021
Recall	0.722	0.033
F1 Score	0.745	0.028

Table 3. Aggregated performance metrics of IE across prompting strategies (n = 70).

Prompt Revision	TP	NI	FP	DR	Pr.order	PE Strategies
P-Basic	NA [Descriptive response]				7	Basic
P-Out	152	92	118	35.87%	5	o/p structure
P-Out-FS	198	50	101	19.9%	6	o/p schema, Few-shot
P-Out-CNST-FS	248	75	67	23.9%	2	o/p structure, Constrained, Few-shot
P-COT-FS	207	76	109	27.1%	3	o/p schema, Chain-of-thought, Few-shot
P-AUG-FS-CNST	265	76	58	22.7%	1	o/p schema, Semantic n/w, Few-shot, Constrained
P-AUG-FS-CNST-COT	209	76	101	24.8%	4	o/p schema, Semantic n/w, Few-shot, Constrained, Chain-of-thought
Grand Total/Avg.	1279	445	554	25.70

Notes: Column header names represent True Positives (TP), Novel Insights (NI), False Positives (FP), Discovery Rate (DR), and Pr.order (relative precedence ranking of models based on balanced performance across metrics). Models with stronger performance are highlighted in bold.

Table 4. Classification metrics for model performance in classifying vulnerability factors (F) to the dimensions of human life, termed as ‘dimensions’ in the AWESOME framework.

Dimension	Precision	Recall	F1-Score	Support
Economic Vitality	0.92	0.92	0.92	37
Education and Skill Development	0.46	0.67	0.55	9
Environmental Quality	0.00	0.00	0.00	1
Health	0.88	1.00	0.93	7
Safety and Security	0.00	0.00	0.00	0
Social, Political, Cultural Environments	1.00	0.44	0.61	16
Accuracy	0.77 (70 samples)
Macro Average	0.54	0.50	0.50	70
Weighted Average	0.86	0.77	0.79	70

Table 5. Summary of limitations, implications, and mitigation strategies.

Category	Limitation and Example	Implication/Mitigation
Annotation	IAA computed for a subset of 25 narratives extracts only. Single-annotator protocol used for rest of the dataset.	Risk of subjectivity. Mitigated by expert review of annotations; multi-annotator protocol for a larger dataset with multi-lingual data sources is currently in progress.
Model	Errors in interpreting relational structures (e.g., misreading line diagrams in augmented prompts)	Misclassification of relationships while using multimodal context augmentation. Could be improved by refining visual encoding in context augmentation or using different relation representations.
Data	Under-representation of categories such as Environmental Quality and Safety and Security	This was not a case of lower performance but rather a natural occurrence from narratives.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Padmavilochanan, A.; Gangadharan, V.; Rashed, T.; Natarajan, A. Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation. Big Data Cogn. Comput. 2026, 10, 6. https://doi.org/10.3390/bdcc10010006

AMA Style

Padmavilochanan A, Gangadharan V, Rashed T, Natarajan A. Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation. Big Data and Cognitive Computing. 2026; 10(1):6. https://doi.org/10.3390/bdcc10010006

Chicago/Turabian Style

Padmavilochanan, Aswathi, Veena Gangadharan, Tarek Rashed, and Amritha Natarajan. 2026. "Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation" Big Data and Cognitive Computing 10, no. 1: 6. https://doi.org/10.3390/bdcc10010006

APA Style

Padmavilochanan, A., Gangadharan, V., Rashed, T., & Natarajan, A. (2026). Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation. Big Data and Cognitive Computing, 10(1), 6. https://doi.org/10.3390/bdcc10010006

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Analyzing Vulnerability Through Narratives: A Prompt-Based NLP Framework for Information Extraction and Insight Generation

Abstract

1. Introduction

2. Literature Review

2.1. Theoretical Aspects of Narrative Inquiry

2.2. Computational Approaches to Narrative Analysis

2.2.1. Adoption of LLMs in Social Science Studies

2.2.2. Studies in Mental Health Domain

2.3. Frameworks for Analyzing Individual Vulnerability

3. Methodology

3.1. Module I: Domain Analysis and Problem Mapping

3.1.1. Domain Knowledge and Representation

3.1.2. Dataset Gathered

3.1.3. Annotation Guidelines and Protocol

3.2. Module II: Information Extraction (IE) Framework Development

3.2.1. Model Configuration and Context Management

3.2.2. Prompt Refinement and Context Augmentation

3.2.3. Evaluation Metrics

3.3. Module III: Interpretation and Insight Derivation Pipeline

3.3.1. Categorization of Vulnerability Aspects and Contributing Factors

3.3.2. Attribution of Influences to Specific Vulnerability Aspects

3.3.3. Extraction of Linguistic Markers Expressing Vulnerability

4. Results and Discussion

4.1. Bounded IE Quality Evaluation

4.2. IE Models and Performance Comparison

4.3. Interpretative Results and Vulnerability Insight Discussion

4.3.1. Identified VA and VF: Categorization Results

4.3.2. Linguistic Markers of Economic Vulnerability

4.3.3. Perceived Influences on Vulnerability Aspects

4.4. Comparison with Manual Annotation-Based Insights

4.5. Summary of Findings

5. Limitations and Error Analysis

6. Conclusions and Future Works

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI