CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond

Xu, Jinzhong; Gao, Junyi; Liu, Xiaoming; Yang, Guan; Liu, Jie; Long, Yang; Huang, Ziyue; Yang, Kai

doi:10.3390/math13152460

Open AccessArticle

CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond

by

Jinzhong Xu

^1,2

,

Junyi Gao

^1,3,

Xiaoming Liu

^1,2,*

,

Guan Yang

^1,2,

Jie Liu

⁴,

Yang Long

⁵,

Ziyue Huang

^1,3 and

Kai Yang

^1,3

¹

School of Artificial Intelligence, Zhongyuan University of Technology, Zhengzhou 450007, China

²

Zhengzhou Key Laboratory of Text Processing and Image Understanding, Zhengzhou 450007, China

³

School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China

⁴

School of Information Science, North China University of Technology, Beijing 100144, China

⁵

Department of Computer Science, Durham University, Durham DH1 3LE, UK

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2460; https://doi.org/10.3390/math13152460

Submission received: 28 June 2025 / Revised: 20 July 2025 / Accepted: 24 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Recent Advances of Neural Network Optimization and Algorithms in Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

Adapting Large Language Models (LLMs) to specialized domains like geriatric care remains a significant challenge due to the limited availability of domain-specific data and the difficulty of achieving efficient yet effective fine-tuning. Current methods often fail to effectively harness domain-specific causal insights, which are crucial for understanding and solving complex problems in low-resource domains.To address these challenges, we propose Causality-Aware, Parameter-Efficient Learning (CPEL), a novel framework that leverages domain-specific causal relationships to guide a multi-layer, parameter-efficient fine-tuning process for more effective domain adaptation. By embedding causal reasoning into the model’s adaptation pipeline, CPEL enables efficient specialization in the target domain while maintaining strong task-specific performance. Specifically, the Causal Prompt Generator of CPEL extracts and applies domain-specific causal structures, generating adaptive prompts that effectively guide the model’s learning process. Complementing this, the MPEFT module employs a dual-adapter mechanism to balance domain-level adaptation with downstream task optimization. This cohesive design ensures that CPEL achieves resource efficiency while capturing domain knowledge in a structured and interpretable manner. Based on this framework, we delved into its application in the field of geriatric care and trained a specialized large language model (Geriatric Care LLaMA) tailored for the aged-care domain, leveraging its capacity to efficiently integrate domain expertise. Experimental results from question-answering tasks demonstrate that CPEL improves ROUGE scores by 9–14% compared to mainstream LLMs and outperforms frontier models by 1–2 points in auto-scoring tasks. In summary, CPEL demonstrates robust generalization and cross-domain adaptability, highlighting its scalability and effectiveness as a transformative solution for domain adaptation in specialized, resource-constrained fields.

Keywords:

causality-aware; parameter-efficient learning; domain adaptation; geriatric care

MSC:

68T05; 68T07

1. Introduction

Large Language Models (LLMs) have made significant advancements in natural language processing, enabling their application to specialized domains [1,2]. These models, which are pretrained on vast corpora of general knowledge, demonstrate strong cross-domain performance, providing a robust foundation for domain-specific model customization [3,4,5]. However, when applied to highly specialized fields such as geriatric care, LLMs must integrate expert domain knowledge to effectively enhance reasoning and analytical capabilities [6,7,8]. This alignment is essential to mitigate biases and hallucinations, which can severely affect output accuracy and efficacy [9]. A major challenge remains in improving the adaptability of models, particularly in domains where transfer learning is unreliable and data scarcity hinders effective model training [10,11].

Despite growing research focused on tailoring large models for specialized domains [7], these efforts face significant challenges, especially in the area of domain adaptation [12,13]. One of the primary limitations is the gap between general-purpose pretrained data and the highly specific needs of specialized domains. Addressing this gap requires advanced and efficient adaptation strategies. Furthermore, the scarcity of high-quality domain-specific data limits model development and training [14]. While researchers have explored techniques like pretraining, fine-tuning, and transfer learning to equip models with domain-specific knowledge [15,16,17], they face persistent challenges such as ensuring domain relevance, improving data quality, overcoming resource constraints, and reducing training time [18,19,20].

The success of domain-adaptation strategies hinges on their ability to effectively tailor models to specialized domains. These strategies must enable models to efficiently learn from limited domain-specific data while optimizing task performance. However, traditional approaches that rely solely on large, private datasets are insufficient for ensuring effective domain adaptation [12]. A promising solution is parameter-efficient fine-tuning (PEFT), which freezes the base model’s parameters and trains only lightweight adapter layers. This method reduces the computational burden (e.g., memory usage, training time) and minimizes the dependency on large-scale datasets, enabling effective training even with limited samples [21,22].

Moreover, recent research into the mechanisms underlying the generation of large language models suggests that effective prompts can activate a model’s cognitive processes, thereby aligning its responses more closely with user intent [23,24,25,26]. Large language models exhibit strong potential for causal discovery: by inferring causal relationships from limited domain data and generating causal prompts, they refine text generation and improve adaptation performance in specialized domains.

In this paper, we propose a novel framework, Causality-Aware Parameter-Efficient Learning (CPEL), which is designed to enhance domain adaptation in low-resource environments. CPEL integrates two key components: the MPEFT module and the Causal Prompt Generator. The MPEFT module employs a multistep, parameter-efficient learning strategy to optimize domain task performance within hardware constraints. The Causal Prompt Generator improves data utilization by extracting causal relationships from domain-specific data and generating targeted prompts to guide the model toward acquiring domain expertise. Together, these components enable models to address complex tasks while minimizing data dependency and computational demands, prioritizing innovative adaptation techniques over mere data scaling. The framework’s generalized training paradigm for specialized domains is depicted in Figure 1.

To guide this study, we formulated the following research questions: RQ1: How can large language models be efficiently adapted to specialized, low-resource domains such as geriatric care, given the limitations imposed by domain shift, data scarcity, and resource constraints? RQ2: Can causality-aware prompt design enhance the reasoning and adaptation capabilities of LLMs in vertical domains by leveraging latent causal structures from domain data? RQ3: Does the integration of causal prompts with parameter-efficient fine-tuning result in measurable improvements in both task performance and computational efficiency compared to standard adaptation methods?

The main contributions of this paper are as follows:

This study identifies a critical challenge in adapting large language models (LLMs) to specialized, resource-constrained domains like geriatric care: the underutilization of causal information amidst data scarcity and limited computation. We propose enhancing parameter-efficient prompt learning with counterfactual reasoning and causal insights to improve the adaptability of LLMs in low-resource settings without extensive retraining.
We introduce Causality-Aware Parameter-Efficient Learning (CPEL), a novel two-stage pipeline. Specifically, The Parameter-Efficient Fine-Tuning (MPEFT) module employs a two-step LoRA-Adapter routine for lightweight adaptation. The Causal Prompt Generator (CPG) further generates counterfactual prompts to enhance task performance. This framework enables efficient LLM specialization in low-resource settings.
The effectiveness of the proposed CPEL framework is rigorously validated through extensive experiments, and the results demonstrate that CPEL achieves significant improvements compared to mainstream LLMs and outperforms frontier models by 1–2 points in auto-scoring tasks. Moreover, it reduces training parameters to approximately 0.001% of those used by the original model, significantly lowering computational costs and training time.

2. Related Works

2.1. The Specialized Field Large Language Model

Specialized domain models are advanced machine learning systems tailored for specific industries such as finance, healthcare, law, and education. By incorporating domain-specific knowledge, these models achieve high accuracy and efficiency in intelligent services. The rise of abundant data resources and powerful computation has accelerated their development and deployment.

In the medical field, several domain-specific large language models have emerged. ChatCounselor [27] generates personalized responses by capturing user preferences and context. Zhongjing [28], DoctorGLM [29], and HuatuoGPT [30] enhance medical reasoning through curated clinical dialogues. In law, LawyerLLaMA and LAiW [31] integrate legal knowledge and examination data, demonstrating strong performance in legal consultation and reasoning. In finance, XuanYuan2.0 [32] and FinGPT [33] address challenges like data volatility and low signal-to-noise ratios, significantly improving financial analysis and decision-making. In education, EduChat [34] and BIOMEDGPT [35] provide adaptive, empathetic support for various educational stakeholders.

These models are typically developed in two stages: domain-specific pretraining and then instruction fine-tuning, often leveraging GPT-3.5-generated dialogue data. However, their training remains resource-intensive, demanding large-scale datasets and significant computational power.

2.2. Causal Representation Learning in Large Language Models

The performance of large language models (LLMs) on complex reasoning tasks remains limited, especially in scenarios requiring causal understanding [36]. To address this, causal reasoning techniques have been increasingly integrated into LLMs. Causal inference originates from three main paradigms: the latent outcome framework, causal graphical models, and structural equation modeling (SEM) [37]. While the latent outcome framework enables estimation of causal effects from observational data, it lacks the ability to reveal causal pathways. Causal diagrams and SEMs address this by representing variables and their dependencies as directed graphs, which evolve into formal causal graphical models based on directed acyclic graphs (DAGs) [38,39].

A key challenge in causal inference is the scarcity of counterfactual data, which hinders practical estimation of causal effects [40]. Although LLMs can generate synthetic counterfactuals, their performance declines in complex tasks such as relation extraction. To address this limitation, Miao et al. [41] proposed an intervention-based framework to generate commonsense counterfactuals, incorporating multimodal data augmentation to enhance robustness in low-resource and adversarial settings. Similarly, Zhang et al. [42] introduced a front-door causal prompt method to mitigate LLM bias and reduce data requirements in NLP tasks. These advancements underscore the potential of integrating causal representation learning with prompt-based strategies to improve the performance of large language models (LLMs) in scenarios characterized by limited data availability and requirements for complex reasoning [43].

2.3. Research on Model Domain Adaptation Strategies

Model domain adaptation typically involves two key strategies: externally guided adaptation and internally reinforced adaptation. The former relies on external cues—such as prompt design—to steer model behavior without modifying its internal parameters [24]. The effectiveness of this approach hinges on the design of guidance mechanisms that activate the model’s reasoning capabilities for rapid adaptation to new domains.

In contrast, internally reinforced adaptation focuses on deep integration of domain knowledge through pretraining and fine-tuning. Pre-training offers a broad knowledge foundation, while domain-specific fine-tuning refines the model for specialized tasks [5]. This strategy adjusts model parameters through targeted training to better capture domain complexity and diversity.

Prompt engineering has become a central technique in externally guided adaptation. For example, Prompt2Model [44] generates compact and accurate models using only natural language prompts, enabling scalable dataset creation and model construction. DEPT (Decomposed Prompt Tuning) [45] introduces low-rank decomposition and dual learning rates to improve flexibility and parameter efficiency, particularly in few-shot scenarios.

In addition to prompt-based strategies, transfer learning and knowledge distillation are widely adopted to address challenges such as limited data quality and high resource consumption. CoMD [46] optimizes multimodal knowledge distillation using a bidirectional feedback mechanism. Despite their benefits, these approaches demand careful design of teacher models, loss functions, and training settings to ensure effectiveness.

Despite significant advances in domain-adaptation techniques, existing approaches often suffer from two critical limitations: (1) model instability due to insufficient domain-specific data, particularly in specialized fields such as healthcare and law; and (2) high computational and hardware resource requirements during fine-tuning. These challenges hinder the practical deployment of large language models in real-world, resource-constrained environments.

To address these issues, this paper proposes a novel and efficient training framework that integrates causal representation learning to enhance model robustness under low-resource conditions. In addition, a parameter-efficient fine-tuning strategy is introduced to significantly reduce computational overhead, thereby improving scalability and adaptability. This framework aims to provide a practical and generalizable solution for the training of domain-specific large language models with minimal reliance on extensive data or hardware resources.

3. Methodology

This section focuses on the Problem Statement and introduces the Causality-Aware Parameter-Efficient Learning (CPEL) Framework. It builds upon the previous section, which reviewed related works, including domain-specialized large language models, causal representation learning in LLMs, and domain-adaptation strategies for models. In the following sections, we will detail the key challenges in causal representation learning and explain the two main components of the CPEL framework: the MPEFT Module, used for efficient domain adaptation, and the Causal Prompt Generator, used for causal inference. These modules enable large language models to adapt quickly and effectively to specialized domains, especially in resource-limited settings like aged care.

3.1. Problem Statement

This section rigorously defines the problem setting of causality-aware, parameter-efficient domain adaptation. It articulates the task-specific modeling assumptions and learning objectives that underpin the methodological framework introduced in Section 3.2. We delineate the principal challenges and elaborate on the sequential steps involved in learning causal representations, with a particular focus on domain adaptation and fine-tuning strategies for downstream tasks. The overall procedure can be decomposed into the following stages:

3.1.1. Step 1: The Learning Phase—Domain Adaptation

Consider a pre-existing large language model,

L_{W} (Y | X)

, with a parameter volume of and a training dataset

D = {(X_{i}, Y_{i})}_{i = 1, \dots, N}

. For adaptation to a new domain, model weights need iterative updates from their pretrained state,

W_{0}

, to

W = W_{0} + Δ W

. The process of maximizing the objective function for this purpose can be defined as follows:

\underset{W}{argmax} \sum_{(x, y) \in D} \sum_{l = 1}^{| y |} log (P_{W} (y_{l} ∣ x, y < l)) .

(1)

Within the prevailing framework, comprehensive fine-tuning necessitates that the model acquire a weight matrix whose dimensions correspond to the entirety of the pretraining parameters, denoted as

| Δ W | = | W_{0} |

. This process requires significant computational resources.

In the proposed paradigm, this study adjusts only for the minor addition parameter. Consequently,

W = W_{0} + Δ W (a)

remains significantly smaller in dimension compared to the original parameter

| a | < < | W_{0} |

. The training objective can thus be redefined as follows:

\underset{W}{argmax} \sum_{(x, y) \in D} \sum_{l = 1}^{| y |} log (P_{W_{0} + Δ W (a)} (y_{l} ∣ x, y < l)) .

(2)

Under the prevailing model, domain-adaptive training would yield a comprehensive language model enriched with nursing-specific knowledge. Conversely, in the proposed framework, the process would generate specialized nursing-related PEFT components. These components could subsequently be integrated with unaltered, pretrained general language models to facilitate the execution of downstream tasks.

3.1.2. Step 2: The Fine-Tuning Stage—Instruction in Downstream Tasks

In the prevailing paradigm, large language models pretrained in nursing are fine-tuned for specialized tasks using instructions, such as answering questions about health and wellness. This study utilizes a pretrained nursing language model, denoted as

L_{W}

, along with its domain-adaptation parameters, W. Additionally, we introduce a newly initialized model head,

Θ

, and an instruction fine-tuning dataset,

Z = {(X_{i}, Y_{i})}_{i = 1, \dots, N}

. At this juncture, we define a loss function aimed at maximizing the following:

\underset{W}{argmax} \frac{1}{N} \sum_{l = 1}^{N} y_{l} log (P_{W + Δ W_{(a)}} \cdot Θ (x_{l})) .

(3)

In analogy to the domain-adaptive training phase, the dimension of the supplementary parameter is notably smaller than that of the original parameter W. By only updating the ancillary parameters and the model-head parameters, our proposed paradigm significantly reduces computational demands. This enhancement renders it particularly efficient and practical for the often resource-limited settings of geriatric care.

3.1.3. Definitions of Tasks Related to Geriatric Care

In the context of intelligent elder care, downstream tasks typically revolve around health-related reasoning and care planning. We categorize the relevant tasks into the following types:

Health-Knowledge Question Answering (HKQA): This task involves answering domain-specific questions related to health management for the elderly, care in the context of chronic disease, nutrition, medication, and mental well-being. The goal is to provide concise and accurate responses grounded in nursing knowledge.
Health-Event Causal Reasoning (HECR): This involves identifying cause–effect relationships from user descriptions or care records, with aims such as linking symptoms to underlying conditions or recognizing causal chains in events of daily life (e.g., “Poor sleep leads to high blood pressure”).
Care-Plan Generation (CPG): Based on the user’s current state, historical records, or specific queries, the model generates personalized care suggestions or daily routines that align with clinical guidelines and individual needs.

These tasks reflect real-world application scenarios in elder care and serve as the primary evaluation benchmarks for the effectiveness of causal-prompt integration and parameter-efficient tuning in this study.

3.2. Causality-Aware Parameter-Efficient Learning Framework

The primary technical methodology of this paper employs the open-source large model LLaMA, leveraging efficient parameter fine-tuning and causal representation learning techniques to develop a domain-specializedmodel. To this end, we introduce the CPEL framework, which comprises two key modules: the first component, termed the MPEFT Module (Multistep Parameter-Efficient Fine-Tuning Module), is designed to optimize training configurations for domain-specific datasets under resource-constrained conditions, thereby facilitating the efficient adaptation of large language models to specialized domains. The second component, referred to as the Causal Prompt Generator, aims to enhance the model’s capacity for causal inference by leveraging high-performance causal prompts to uncover latent causal relationships within the target-domain data.

3.2.1. MPEFT Module

The MPEFT Module is an integral component of the CPEL framework that is designed to allow a foundational large language model to gradually adapt to domain-specific knowledge through a two-step parameter-learning process that is both efficient and minimally taxing on hardware resources. The module operates as follows. Firstly, the LLaMA language model was chosen as the base model and subsequently adapted to the specific domain via a process known as “Caring LLaMA-LoRA.” This LoRA adapter, based on LLaMA, is adept at a broad array of tasks within the domain of geriatric care. Secondly, the “Downstream LLM” is derived through a process of downstream-task instruction learning. This involves a fine-tuning of instructions built upon the Caring LLaMA-LoRA, which already possesses a knowledge model tailored to the domain of geriatric care. As a result, this fine-tuning ensures that the performance aligns more accurately with domain tasks and user requirements. Crucially, the MPEFT Module addresses the prominent discrepancy between the distributions of the source and target domains by employing a two-step adaptation framework. It favors continuous learning within the target domain for the large language model (LLM), moving away from the conventional multi-domain unsupervised domain-adaptation paradigm. The detailed layout of this scheme is depicted in Figure 2 below.

Domain-Adaptive Training: The Caring-Domain LLM is trained by integrating knowledge of mental health counseling and the daily health domain into LLaMA. This paper assesses the impact of applying four distinct parameter quantities to LLaMA during this process. This study examines the enhancements in computational resource consumption of a module, focusing on training parameters, memory usage, and training time across four models with varying parameter quantities. The module utilizes the auto-regressive language modeling pretraining objective employed in the original LLaMA training. For compatibility with existing computational resources, fixed model hyper-parameters are adopted, enabling the LLM to be accommodated on a single NVIDIA GPU. Through a series of experiments, Gaussian process regression is implemented to facilitate Bayesian optimization for the hyper-parameters of the PEFT method. Consequently, this study assesses the domain scores of the LLM variants at this juncture.

Downstream Task; Meticulous Fine-Tuning: The downstream LLM is procured by meticulously fine-tuning the target model, the Caring LLaMA-LoRA Domain LLM, derived from the initial phase. This fine-tuning process, conducted utilizing a private geriatric-care-related dataset, endows the model with expert guidance, thereby leveraging its robust language-understanding capabilities. Subsequent to this fine-tuning, the model’s proficiency is tested through real-world domain tasks to ascertain its enhanced alignment with domain tasks and user requirements.

This two-phase parameter-efficient strategy aligns with the formal derivation in Section 3.1.1, where only a low-rank residual is optimized over the frozen base model. The adapter structure is fully modular and injected into the LLaMA transformer blocks, requiring minimal parameter overhead while maintaining task performance.

3.2.2. Causal Prompt Generator

The Causal Prompt Generator is a pivotal component of CPEL. It is tasked with extracting the inherent causal relationships within the target-domain dataset, which allows it to craft high-quality causal prompts. These prompts serve as guiding inputs for the model, aiding its comprehension of the domain’s internal knowledge. Consequently, the model’s capacity to discern causality and its prowess in causal reasoning are augmented. This module not only diminishes the model’s reliance on the volume of data within the target domain, but also bolsters the adaptability of the model’s domain-specific knowledge. The Causal Prompt Generator, an enhancement of the preexisting Caring Domain LLM model, is employed in downstream tasks to augment the model’s capacity for causal inference within the realm of intelligent healthcare. Its primary objective is to offer more personalized and efficient care services to the aged population by identifying and reasoning about causal relationships.

Definition 1.

Causal Diagram Model—A causal diagram, denoted as G, is represented by a directed graph in which nodes symbolize variables while edges depict causal relationships. If V represents the set of variables and represents the set of edges, the causal diagram is expressed as

G = (V, E)

.

Definition 2.

The Formula of Causal Inference. The most prevalent formula utilized in causal inference is the counterfactual inference formula, given as follows:

Y (a) = E [Y | d o (A = a)]

. In this formula,

Y (a)

represents the outcome produced under the influence of intervention

A = a

. Furthermore,

E [Y | d o (A = a)]

represents the anticipated value of variable subsequent to the application of intervention

A = a

.

Definition 3.

Latent Variable Model. This model improves feature representation by incorporating a latent causal variable, denoted as Z. A widely used example of a latent variable model is the Structural Equation Model (SEM), which can be represented as follows:

Y = f (X + Z) + ϵ

. Here, Ysignifies the outcome variable, represents the observed features, Z is the latent causal variable, and ϵ is the error term.

In the prevailing paradigm, large language models pretrained for nursing undergo fine-tuning for downstream tasks. A method for causal relationship learning through causal prompts is proposed to address the paucity of domain-specific data resources. Specifically, in downstream tasks, given a set of text data

Z = {(X_{i}, Y_{i})}_{i = 1, \dots, N}

, the language model extracts variables

C = {c_{1}, c_{2}, \dots, c_{n}}

that potentially contain causal relationships. The model first comprehends the text and identifies causal sentences, denoted as

S = {(c_{i}, c_{j})}

, where

(c_{i}, c_{j})

suggests that

c_{i}

causes

c_{j}

. To discover new causal relationships, this paper introduces a causal prompt module by defining

T = P (C)

, where P serves as an explanatory causal prompt. These prompts provide the model with comprehensive descriptions and background information about variables, thereby enhancing its causal discovery capabilities. Consequently, the variables returned by the model serve as input for the phase of causal relationship discovery, boosting its causal reasoning abilities. This approach not only improves the model’s accuracy in identifying causal relationships but also ensures its effectiveness in the domain of geriatric care. It excels at fully leveraging potential causal relationships in the context of limited resources, thereby strengthening clinical decision-making. By integrating causal representation learning with causal prompts, pretrained large language models can effectively tackle the challenge of data scarcity in the nursing field, offer precise causal reasoning and decision support, and ultimately enhance the quality and efficiency of geriatric care. The causal representation learning module is illustrated in Figure 2.

Furthermore, to improve the interpretability and reasoning capability of the generated prompts, we design causal prompt templates that explicitly encode directional causal knowledge extracted from the domain DAG. For example, instead of asking “What is the likely symptom?”, we ask “Given that A leads to B, what might be the cause of B?”. These causal prompts serve as the foundation for improved generation tasks, as shown in Table A1 (Appendix A.1).

To ensure the effectiveness and reliability of the causal relationships identified by the Causal Prompt Generator, a comprehensive multi-faceted validation strategy is employed. First, in the absence of gold-standard annotations, domain experts in the nursing field from the research team were invited to manually assess the plausibility and correctness of the extracted causal relations. Second, we quantitatively compare the performance of domain-specific dialogue datasets that include extracted causal knowledge to the performance of those without using standard metrics such as ROUGE, BLEU-4, and LLM Score. Third, case-based comparisons are conducted to evaluate whether incorporating causal prompts enhances performance in downstream tasks, such as answering health-related questions and clinical decision support. Finally, controlled interventions and perturbations are applied to the model inputs to examine whether the model truly relies on underlying causal mechanisms rather than superficial correlations. Through this systematic evaluation, the extracted causal relationships are assessed in terms of both accuracy and practical utility, thereby establishing a solid foundation for effective causal reasoning in intelligent elderly care.

This paper’s framework, through its causal representation module, not only bolsters the model’s precision in delineating causal relationships with minimal data samples, but also augments its efficacy in task-oriented dialogues and domain-specific knowledge questioning. When interacting with users via natural language, the model can proffer more precise health advice premised on the discerned causal relationships, thereby aiding the aged in managing their health conditions more effectively.

4. Experiment

To assess the adaptability and effectiveness of the proposed dual-adapter mechanism and causal prompt strategy, experiments were conducted in two phases: domain adaptation and downstream task fine-tuning. These experiments were validated on three tasks: Health Knowledge Question Answering (HKQA), Health Event Case-based Reasoning (HECR), and Care Plan Generation (CPG). The LLaMA series was utilized as the base model, with initial pretraining carried out on public datasets to ensure computational efficiency. The resultant optimized model, named Caring LLaMA-LoRA, underwent further fine-tuning using the proprietary HCaring dataset for downstream applications. Comparative comparisons to state-of-the-art models such as LLaMA3.1-70B, Gemini-Pro, and GPT-4o-mini were performed to substantiate the efficacy of the proposed framework.

4.1. Dataset

We employed two public datasets and one proprietary domain-specific dataset (HCaring). The public datasets include medical QA pairs and clinical narratives, while the HCaring dataset comprises 15,000 real-world geriatric-care-related dialogues and decision records, manually annotated by domain experts. These datasets ensured both general medical coverage and task-specific alignment to problems of elder care.

mental_health_counseling_conversations was curated by aggregating questions and responses from various online counseling and therapy platforms. Comprising 3510 entries, the content of this dataset is predominantly centered around mental health concerns. Importantly, the responses were contributed by certified mental health counselors, ensuring the high quality of the data.
Medical_Customer_care comprises 207 K entries primarily focused on health-related knowledge and information. Each entry offers comprehensive details and recommendations.
The HCaring dataset is a curated corpus designed to support domain adaptation and evaluation in elder-care scenarios. It is built from a diverse collection of video materials sourced from public platforms such as YouTube, Bilibili, and TikTok, covering topics like nursing procedures, rehabilitation training, psychological counseling, daily care routines, and promotion of a healthy lifestyle. To enable structured training of language models, the dataset is transformed into a set of high-quality QA pairs. Each entry follows a standardized schema consisting of the following: (1) an instruction field—an optional category or scenario tag describing the caregiving context; (2) an input field—a natural-language question derived from real-world situations related to geriatric care; (3) an output field—a reference answer crafted by human annotators based on domain-specific guidelines, expert consensus, or verified clinical resources.All QA pairs were constructed and reviewed following task-specific annotation protocols to ensure consistency, domain coverage, and quality control. A representative sample is provided in Appendix B to illustrate the data format and annotation style.

4.2. Domain-Adaptive Training

4.2.1. Experimental Setup

To improve the efficacy of training, we designed an experimental setup based on two publicly available datasets for domain-adaptive training. To ensure that the experiments capture a wide range of domain-specific features and reflect realistic data variability, we combined these datasets and performed stratified random sampling to select 30,000 data entries. We experimented with four LLaMA model variants (1B, 3B, 7B, 8B) to evaluate scalability. This diverse selection enables us to evaluate the scalability of our approach across different parameter scales and verify its robustness under varying model capacities. All models were fine-tuned using LoRA with the following configuration. The Learning rate was set to

2 \times 10^{- 5}

, determined through preliminary tuning to ensure stable convergence. The LoRA rank was set to 16 to balance model expressiveness with parameter efficiency. The LoRA alpha value was set to 32 to appropriately scale the low-rank adaptation weights. A LoRA dropout rate of 0.05 was applied to mitigate overfitting during fine-tuning. All LoRA-based fine-tuning experiments were conducted using the LLaMAFactory framework, which integrates HuggingFace Transformers (v4.33.2), PEFT (v0.6.2), and PyTorch (v2.4.1) under CUDA 12.4. LoRA adapters were inserted into the self-attention layers of the decoder blocks in the LLaMA models.

All experiments were executed on a single NVIDIA RTX A5000 GPU (24 GB memory), paired with an Intel(R) Core(TM) i7-10700K CPU @ 3.80 GHz and 64 GB RAM and running Ubuntu 20.04 LTS. The software environment consisted of Python 3.10 and PyTorch 2.4.1+cu124. Model convergence was monitored using perplexity loss on a held-out validation set. On average, convergence was achieved within three epochs, each requiring approximately three hours for the 8B model on the A5000 GPU. The use of LoRA reduced the number of trainable parameters to approximately 0.001% of the full model size, significantly decreasing GPU memory consumption and training time compared to fine-tuning of the full model.

4.2.2. Experimental Results

Results are summarized in Table 1. The LoRA-enhanced Caring LLaMA-LoRA achieved robust adaptation across parameter scales. Among them, the 8B model demonstrated the best domain alignment, as confirmed through both automatic metrics and human evaluation.

4.2.3. Experimental Analysis

Analysis of the Influence of Base Models on Domain-Adaptive Training. As illustrated in Table 1 and Figure 3, the results of domain-adaptation training demonstrate the direct impact of the base model on the adaptability of the domain area during training with the proposed framework. The multistep, efficient parameter fine-tuning module in this framework utilizes the LoRA algorithm in PEFT for model training in the initial domain-adaptation training step. The fundamental principle is to enhance the overall effectiveness of the model by incorporating a domain adaptive layer without compromising the intrinsic capabilities of the large language model. During the training process, the Tokenizer of the base model is employed, and subsequently, the trainable parameter matrix of the base model is frozen and replaced by two small dimension matrices related to LoRA, which significantly reduces the number of trainable parameters. Subsequent to training, the adapter layer is integrated with the original model to yield the domain models described in this study. Evaluation of these domain models indicated that base models with larger parameter sizes exhibited stronger reasoning capabilities, with the 8B model demonstrating the most optimal performance in model evaluation scores. However, the findings indicate that the 7B model exhibited the least optimal performance across all indicators. This is primarily attributable to the fact that the 7B model in the LLaMA series is employed in the LLaMA2 version. Given the absence of an “Instruct” version, the predominant approach adopted in the training of the framework involves the efficient fine-tuning of parameters. Consequently, the 7B model exhibits the least favorable performance. In summary, a clear positive correlation exists between the scores of the domain models trained in this framework and the parameter size of the base model.

Evaluation Methods for Domain-Adaptation Effects of Large Language Models Using LoRA Technology. The large language model-generation strategy is analogous to a game of text solitaire. Given the initial n tokens of a paragraph, the objective is to predict the n+1th token. This process yields a distribution of the probability of the k+1th word occurring. This generation strategy is referred to as autoregressive [47].

In this paper, we initially utilize the autoregressive evaluation method to execute a generalized evaluation of large language modeling performance. As demonstrated in Figure 3, the domain-generation performance of the proposed framework is evaluated using the ROUGE and BLEU-4 values. It can be discerned that the framework of this paper enhances the domain-specific performance of disparate versions of LLaMA, thereby fully exemplifying the generalizability and efficiency of this paper’s framework. However, in order to select the most suitable model version as the base model for downstream task experiments, it is necessary to consider more than just the automatic evaluation when assessing the domain-adaptation ability of the model obtained by our training framework [48]. To this end, we conducted a comprehensive model evaluation. By employing a third-party large language model to evaluate our trained domain models impartially, we are able to select the model with the highest score as the base model for the subsequent training step. In this paper, we utilize Doubao-pro-32k to assess the capability of different versions of LLaMA-LoRA models trained by the framework described herein. The evaluation of the various versions of LLaMA-LoRA models is facilitated by the utilization of Doubao-pro-32k, which prompts questions from untrained data within the public dataset. The responses generated by the domain model are then evaluated in relation to the answers provided in the original dataset. This evaluation process involves system prompts for the designated task, input format, output format, and sample outputs, along with a scoring system that references the answers in the original dataset. The final score of the domain model is determined by taking the arithmetic mean. In the training framework of this paper, the LLama 3.1-8B-LoRA model demonstrates the highest performance in the pretraining for adaptation to the aged-care domain. Given a maximum score of 9 at the time of large model evaluation, the model attained a score of 6.2. This pretrained LoRA is henceforth referred to as “Caring LLaMA-LoRA”.

Our framework demonstrates high portability and low resource requirements, offering a scalable solution for real-world applications in geriatric care. The methodology can also be extended to other domains requiring lightweight domain specialization. This experiment section provides robust evidence for the effectiveness and generalizability of the CPEL framework in geriatric care and its potential adaptability to broader domains. In the next section, we proceed to downstream instruction tuning with the Caring LLaMA-LoRA model. But the current setup focuses solely on text-based monolingual corpora and excludes multimodal or multi-turn dialogues. Additionally, external evaluators like Doubao-pro-32k may introduce subjective bias. Future work will explore the integration of causal structures in multilingual and multimodal contexts.

4.3. Downstream Task-Instruction Fine-Tuning

4.3.1. Experimental Results

In the design of the downstream task-instruction fine-tuning experiment, this paper uses the target model of domain adaptation stage Caring LLaMA-LoRA as the base model for the subsequent instruction fine-tuning and causal cueing stage. The resulting final model Downstream LLaMA-LoRA* is renamed Geriatric Care LLaMA. In this study, we will be comparing the performance of the open-source model LLaMA3-70B, which has a large number of parameters, and the main large language models Gemini-pro and GPT-4o-mini, which use the zero-sample and causal cueing techniques, respectively, as the baseline models. This will assist in verifying the effectiveness of the training framework proposed in this paper. The causal Prompts proposed in this paper are denoted by “*” in the experimental tables. From the results of the fine-tuning of the causal Prompt of the results of the instruction of the downstream task shown in Table 2, this paper can be decomposed into multiple research questions. In this stage, we conduct verification for the three tasks of HKQA, HECR, and CPG.

To ensure fair comparison and consistent generation behavior across all models, we use the following decoding configuration unless otherwise specified: a maximum context window size of 1024 tokens, temperature = 0.95, top-p = 0.7, and top-k = 50. Responses are generated using greedy decoding with sampling enabled.

4.3.2. Experimental Analysis

Experimental results examine the performance of LLaMA in geriatric-care NLP tasks using LoRA. The present study evaluates the performance enhancement of LoRA-enhanced LLaMA models in natural language processing (NLP) tasks related to geriatric care. A private dataset is customized with HKQA and HECR tasks, and the model’s ability to master specialized domain-specific knowledge is assessed. This is achieved by evaluating its performance on question-answering tasks in various scenarios within the domain of geriatric care. In our experiments, we first perform a longitudinal comparison by evaluating the score of the original base model LLaMA-8B, as well as those of Caring-LLaMA-LoRA and Downstream-LLaMA-LoRA on the private dataset. Subsequently, a horizontal comparison is conducted using the LLaMA-70B model, which has nearly 10 times more parameters than the base model. This model is then compared with prominent large language models such as GPT-4o-mini and Gemini-Pro.

The findings indicate that the target model, which was trained using the specified framework, has demonstrated enhancements in ROUGE, BLEU-4, and large model scores when compared to the base model. These enhancements are reported to be 26.36%, 22.39%, and 3.06%, respectively. These improvements are not yet conclusive. These enhancements are attributable to the multistep parameter fine-tuning module in our framework, which facilitates mastery of domain knowledge through the lora method. The LoRA method employed in this framework involves fixing the weight matrices of the existing pretrained layers and fine-tuning only the newly added layers, thereby reducing computational complexity. Specifically, LoRA selects target weight matrices in each Transformer layer for fine-tuning and uses random Gaussian initialization along with zero initialization for the new matrices to ensure the flexibility of the fine-tuning process. Consequently, following the training phase with domain-specific data, the model attains proficiency in the specialized knowledge pertinent to the field, thereby significantly enhancing the domain expertise of the base large language model. Additionally, the Causal Prompt Generator enhances the model’s mastery of domain knowledge in the field of geriatric care. In comparison to the mainstream GPT series models, GPT-4o-mini achieved a marginal lead in the large model score by 0.32, a result attributable to the incorporation of the Causal Prompt Generator within the framework of this study. However, for the fundamental GPT-4o-mini model, the increase was 1.73 times higher, primarily due to the fact that contemporary prominent large language models have already attained completeness in terms of the breadth of specialized domain knowledge yet still require enhancement in terms of the depth of knowledge. This outcome signifies that the LoRA-enhanced LLaMA model substantially enhances performance on domain-specific question–answer tasks while preserving computational efficiency, thereby underscoring its extensive application potential.

How to evaluate the performance of the model in real geriatric-care scenarios? To comprehensively assess the model’s performance in realistic geriatric-care settings, we introduce a dual evaluation methodology: Automatic Evaluation: Utilizing generic evaluation metrics (ROUGE and BLEU) to measure generation quality. Domain-Specific Evaluation: Implementing a large language model evaluation that emphasizes domain specialization. In this process, a private dataset poses common caregiving-related inquiries, and the model’s responses are scored by ten senior experts from our university, ensuring a nuanced and context-sensitive evaluation. Each expert’s score, based on a detailed scoring rubric, is averaged to produce the final model score. A final score exceeding 3 is considered indicative of a valid and contextually appropriate answer. Unlike classical causal discovery tasks, our dataset does not contain gold-standard causal graphs. Therefore, metrics such as structural Hamming distance or graph-level precision/recall are not applicable. Instead, we evaluate the impact of causal prompts directly through improvements in performance on downstream tasks.

The experimental results, as presented in Table 2, show that the private dataset poses questions to the model by presenting common inquiries encountered in caregiving scenarios and that the evaluation is based on the model’s responses. The target model, trained using the proposed framework, exhibits superior performance in comparison to the existing leading large language model, GPT-4o-mini, in terms of performance. This enhancement is primarily attributed to the emergence of the LoRA technique of the CPEL framework, which facilitates lightweight domain adaptation, enabling the model to accurately capture healthcare-domain-specific knowledge and patterns while maintaining high performance. Secondly, the base model has a substantial knowledge base consisting of high-quality data from the domain of geriatric care. The model is fine-tuned allow it to understand domain-related problems more accurately by collecting real case data from different scenarios. In the downstream task, instruction fine-tuning causal representation learning is introduced; it uses causal prompts to support the model in better understanding the meaning of the input statements; the model must acquire the internal causal relationships in the training data and then generate more professional answers. In contrast, GPT-4o-mini, despite having undergone extensive training, has primarily focused on task processing in the general domain. This may have led to a deficiency in specialized optimization and a lack of depth in knowledge related to questions in geriatric care .

As the present study concentrates on models of the specialized domain, the automatic assessment technique exhibits evident limitations in evaluating the similarity of the textual content of the utterances. In contrast, manual evaluation, by virtue of its operating from in-depth understanding of the textual context of the specialized domain, can be flexibly adapted according to different assessment needs, thus ensuring the reliability and accuracy of the results. In this study, ten senior experts from our university specializing in intelligent recuperation projects were invited to formulate a set of detailed and appropriate scoring rules for the model-generation task. These rules were designed to comprehensively and objectively assess the model’s performance. In practice, when a question or instruction is posed to the model, it will generate a text accordingly. Subsequently, ten domain experts will provide a detailed score for the text generated by the model according to the indicators in the scoring rules. To ensure the fairness of the scoring, each expert’s score is based on the average of the metrics. Finally, the scores of the ten experts are averaged, and the result is the model’s final score on the problem. A final score higher than 3 is indicative of a valid answer to the posed question, and having more valid answers more strongly evidences the model’s advanced ability in this domain. Furthermore, Figure 4 provides a visual representation of the instances of manual evaluation by artificial experts, thereby offering substantial empirical evidence to support the research findings of this study. To further strengthen the credibility of the manual evaluation, we computed the inter-rater agreement using Fleiss’ Kappa, which reached a value of 0.72 across three independent raters on a 200-sample subset, indicating substantial agreement among the evaluators. Fleiss’ Kappa is a widely used statistical measure that assesses the degree of agreement among multiple raters beyond chance and is especially suitable for categorical or ordinal ratings. This result reinforces the consistency and robustness of domain-specific scoring.

Analyzing the adaptation of the training framework in the field of geriatric care. The current mainstream large language model demonstrates deficiencies in its depth of knowledge in the domain of geriatric care; for example, in the question “In the early morning, an old man suffering from Alzheimer’s disease, early to get up and sit on the bedside chair, the old man looked out of the window, at this time the nursing staff came in, what to do next?” The currently popular large language models can only understand the surface meaning of the sentence; they interpret it to mean that the old man woke up, and they then begin to arrange the behavior of the old man for the day. The model’s comprehension of the sentence’s surface meaning is limited to this interpretation. Our model’s ability to capture the old man’s emotional state through the action “the old man looked out of the window” is noteworthy. This action, as postulated by the model, may be attributed to a longing for his relatives. Consequently, the model’s capacity to discern the deep meanings of words in a statement, informed by the input information, is evident in its ability to provide pertinent suggestions in its response. This capacity is attributable to the causal representation module embedded within our framework, which is adept at discerning potential causal relationships in data and thereby facilitating the model’s learning of intrinsic causal relationships. As demonstrated above, this framework exhibits considerable promise in the domain of geriatric care. A more detailed real case is illustrated in Table 3. In this case, we can intuitively see the excellent performance of our model on the CPG task.

4.4. Ablation Study

To systematically assess the contributions of individual modules within our proposed training framework, we conducted ablation studies on the private care dataset. This section evaluates the impact of two primary modules—namely, the efficient parameter fine-tuning module (MPEFT module) and the causal representation learning module—on model performance.

In the first stage, the MPEFT module leverages the LoRA technique from PEFT, where only the supplementary parameters and the output head are fine-tuned. This selective updating mechanism significantly reduces computational requirements and training time while preserving the model’s intrinsic capabilities. Empirical results indicate that the efficient parameter fine-tuning module boosts performance by 0.56 compared to the base model and by 1.43 relative to the intermediate model, Caring LLaMA-LoRA.

The second stage introduces the Causal Prompt Generator, which employs causal graphs and causal prompts to uncover and utilize causal relationships among variables such as health status, daily activities, and dietary habits. This module enables the model to perform causal reasoning and generate personalized recommendations for interventions. For example, by discerning whether “improving diet” or “increasing exercise” would yield better outcomes, the model provides more precise and context-aware health management suggestions. Experimental results, as shown in Table 2, reveal that integrating the Causal Prompt Generator further enhances the target model’s performance by an additional 0.49.

The ablation studies validate that both modules contribute significantly to overall performance improvement. The MPEFT module demonstrates that parameter-efficient adaptation can achieve substantial gains with minimal computational cost, which is crucial in resource-constrained settings such as geriatric-care environments. Meanwhile, the Causal Prompt Generator not only refines the model’s decision-making process through causal inference but also offers a more interpretable and robust mechanism for understanding domain-specific interdependencies. These findings highlight the framework’s potential applicability beyond the geriatric-care domain, suggesting promising directions for domains like finance or law, where causal reasoning is vital.

5. Conclusions

This paper presents a novel Causality-Aware Parameter-Efficient Learning (CPEL) framework that combines causal representation learning with multi-stage parameter-efficient optimization. The framework consists of three core components: (1) the MPEFT module, which produces an intermediate domain-specialized model (Caring LLaMA-LoRA); (2) a parameter-efficient fine-tuning stage to derive the task-specific Downstream LLaMA-LoRA; and (3) a Causal Prompt Generator that incorporates causal graphs and context-aware prompts to enhance reasoning in applications related to elder care. The final model, Geriatric Care LLaMA, demonstrates superior performance in geriatric-care-related NLP tasks, achieving a ROUGE score of 34.96%, BLEU-4 of 25.41%, and a composite LLM score of 7.06—outperforming GPT-4o-mini by 1.73 points.

Beyond performance, CPEL significantly reduces GPU memory usage and training time by optimizing only 0.001% of the original parameters. This efficiency makes it well-suited for use in resource-constrained domains such as healthcare. Additionally, the integration of causal prompts enables the generation of personalized, context-aware recommendations, further enhancing decision-making in geriatric care and holding the potential for extension to domains like finance, law, and technical support.

Nonetheless, several limitations remain. First, the evaluation relies primarily on a private dataset, which constrains the generalizability of the findings. Second, the assessment metrics are predominantly automatic; incorporating human evaluation in future studies would offer more comprehensive insights. While the proposed method demonstrates consistent improvements with causally-informed prompts, the current analysis lacks structural evaluation metrics such as Structural Hamming Distance (SHD). Future work could address this gap by leveraging synthetic or annotated datasets to directly assess the quality of the induced causal structures.

Moreover, although this study adopts LoRA as the foundation of the parameter-efficient fine-tuning framework, future research could explore alternative PEFT techniques—such as AdapterDrop, Prefix-Tuning, or IA³—to better contextualize efficiency–performance trade-offs in the context of causal prompt adaptation. Finally, while the method has been validated in the domain of elder care, its cross-domain adaptability remains untested. Future directions include expanding dataset diversity, improving evaluation protocols, and validating the framework across domains characterized by varied causal structures.

Given the sensitive nature of healthcare applications, safety and robustness are of paramount importance. Although this study emphasizes domain adaptation and causal representation, we have incorporated basic safety mechanisms—such as manual validation of training data, prompt-based hedging to mitigate hallucinations, and domain-informed scoring protocols—to reduce potential risks. Additionally, inter-rater agreement (Fleiss’ Kappa = 0.72) has been reported to reinforce scoring consistency. Nevertheless, we recognize that a comprehensive safety framework—including failure-mode analysis, confidence estimation, and intervention fallback strategies—remains an essential direction for future work.

In summary, CPEL provides an effective, scalable solution for domain adaptation under low-resource conditions. It offers both empirical gains and theoretical insights, laying a foundation for further advancements in the adaptation of causal language models across specialized domains.

Author Contributions

J.G.: Conceptualization, Methodology Writing Original Draft, Coding. J.X.: Writing Review Editing, X.L.: Methodology, Writing Review Editing. G.Y.: Writing Review Editing. J.L.: Writing Review Editing. Y.L.: Writing Review Editing. Z.H.: Data Collection, Writing Review Editing. K.Y.: Data Collection, Writing Review Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Key Research and Development Program of China (2020AAA0109703); National Natural Science Foundation of China (62076167, U23B2029); The Key Scientific Research Project of Higher Education Institutions in Henan Province, China (24A520058, 23A520022); Postgraduate Education Reform and Quality Improvement Project of Henan Province, China (YJS2024AL053).

Data Availability Statement

The mental health counseling conversations used in this study are available through Hugging Face page: https://huggingface.co/datasets/Amod/mental_health_counseling_conversations (accessed on 10 December 2024). The Medical Customer care data used in this study are also available through Hugging Face page: https://huggingface.co/datasets/DR-DRR/Medical_Customer_care (accessed on 12 December 2024). At present, we are unable to make the Hcaring dataset publicly available, as it forms part of an ongoing research project of Smart Health and Elderly Care. We fully understand that this may cause inconvenience to colleagues who wish to replicate or build upon our work, and we sincerely apologize for this. However, we are committed to releasing the dataset promptly once the research is completed and published, ensuring that the academic community can access and benefit from our findings. If you require access to our private dataset in the meantime, we would be happy to provide some data for your reference.

Acknowledgments

The authors acknowledge the use of ChatGPT (developed by OpenAI) for grammar checking and language editing support during manuscript preparation. No content generation, idea development, or experimental results were produced using AI-based tools. All scientific contributions and interpretations are the result of the authors’ own work.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

Table A1. Comparison of Base and Causal Prompts Used in HKQA and HECR Tasks.

Task	Prompt Type	Prompt	Output Behavior
HKQA	Base Prompt	Why should caregivers help elderly individuals brush their teeth every day?	General reasoning; hygiene-focused
HKQA	Causal Prompt	Given that lack of oral hygiene can lead to tooth decay and systemic infection, why is it important for caregivers to help elderly individuals brush their teeth daily?	More specific causal reasoning (cause–effect-prevention)
HECR	Base Prompt	After successful CPR, how to place the elderly?	Describes position; lacks rationale
HECR	Causal Prompt	Given that the elderly patient has just been resuscitated from cardiac arrest, what is the proper placement to ensure airway patency and vital sign stability while waiting for emergency services?	Emphasizes causal consequence of improper posture, safety relevance

Table A1 illustrates the difference between base prompts and causal prompts across two downstream tasks (HKQA and HECR). We observe that causal prompts, which incorporate explicit cause–effect context, lead to more structured and clinically appropriate model responses.

Appendix A.2

Table A2. Examples of base and causal prompts with corresponding model outputs across HKQA, HECR, and CPG tasks.

Task	Base Prompt and Output	Causal Prompt and Output
Example 1—HKQA	Prompt: Why should caregivers help elderly individuals brush their teeth every day? Output: “To maintain oral hygiene and prevent dental problems.”	Prompt: Given that lack of oral hygiene can lead to tooth decay and systemic infection, why is it important for caregivers to help elderly individuals brush their teeth daily? Output: “Because inadequate oral hygiene can result in systemic infections such as aspiration pneumonia, it is crucial for caregivers to assist elderly individuals in daily brushing to ensure both oral and overall health.”
Example 2—HECR	Prompt: After successful CPR, how to place the elderly? Output: “Place in supine position and monitor vital signs.”	Prompt: Given that the elderly patient has just been resuscitated from cardiac arrest, what is the proper placement to ensure airway patency and vital sign stability while waiting for emergency services? Output: “To prevent airway obstruction and maintain adequate blood circulation after resuscitation, the elderly should be laid in a supine position with the airway open and continuously monitored until emergency staff arrives.”
Example 3—CPG	Prompt: What is the standard procedure for administering insulin? Output: “Follow steps to inject insulin under the skin.”	Prompt: Considering that improper insulin administration can lead to hypoglycemia or hyperglycemia, what is the correct step-by-step procedure for safely administering insulin to elderly patients? Output: “To avoid dangerous glucose fluctuations, caregivers must ensure correct dose selection, site rotation, and post-injection monitoring during insulin administration in elderly patients.”

Appendix B

The following shows several representative examples from the anonymized HCaring dataset, structured in JSON format.

[

{

“ instruction ” : “ Life care; personal hygiene; washing face; water temperature; ”,

“ input ” : “ What is the appropriate water temperature for washing the face of an elderly person ? Why ? ”,

“ output ” : “ The water temperature should be close to human body temperature, avoid overheating or overcooling to avoid irritation or discomfort to the elderly ’ s skin ” .

},

{

“ instruction ” : “ Water temperature; judgement; ”,

“ input ” : “ How to judge the water temperature is suitable ? ”,

“ output ” : “ You can use the back of your hand or wrist to test the water temperature and make sure the water temperature is comfortable before performing treatment ” .

},

{

“ instruction ” : “ Women; Makeup; Precautions; ”,

“ input ” : “ What should caregivers pay attention to when helping elderly women put on makeup ? ”,

“ output ” : “ When applying makeup, choose light cosmetics and avoid excessive use of irritating cosmetics to keep the skin comfortable and healthy . Encourage elderly women to improve their quality of life and maintain a positive attitude towards life through self - care . ”

},

{

“ instruction ” : “ wiping techniques; wrapping techniques; ”,

“ input ” : “ What is the package method ? ”,

“ output ” : “ The wrapping technique is a special wiping technique that ensures that the towel fits the hand well during use, increasing the precision and softness of the operation . ”

},

{

“ instruction ” : “ Packing techniques; advantages; ”,

“ input ” : “ What are the benefits of using the wrapping technique when washing your face ? ”,

“ output ” : “ It can avoid direct and forceful rubbing and reduce irritation to the elderly ’ s skin . It is especially suitable for the sensitive facial skin of the elderly . ”

}

]

Appendix C

Algorithm A1: Prompt-Based Causal Relation Discovery

Appendix D

Note: According to Landis and Koch (1977), a Fleiss’ Kappa between 0.61–0.80 indicates substantial agreement, and values above 0.80 indicate near-perfect agreement.

References

Shim, J.-Y.; Kang, B.-Y.; Yun, T.-J.; Lee, B.-R.; Kim, I.-S. The Present Situation of the Research and Development of the Electromagnetic Pulse Technology. Mater. Today Proc. 2020, 10, 142–149. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar] [PubMed]
Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large Language Models: A Survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
Xu, M.; Yin, W.; Cai, D.; Yi, R.; Xu, D.; Wang, Q.; Wu, B.; Zhao, Y.; Yang, C.; Wang, S. A Survey of Resource-Efficient LLM and Multimodal Foundation Models. arXiv 2024, arXiv:2401.08092. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Luo, Z.; Xu, C.; Zhao, P.; Geng, X.; Tao, C.; Ma, J.; Lin, Q.; Jiang, D. Augmented Large Language Models with Parametric Knowledge Guiding. arXiv 2023, arXiv:2305.04757. [Google Scholar]
Ling, C.; Zhao, X.; Lu, J.; Deng, C.; Zheng, C.; Wang, J.; Chowdhury, T.; Li, Y.; Cui, H.; Zhang, X.; et al. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey. arXiv 2023, arXiv:2305.18703. [Google Scholar]
Aduragba, O.T.; Yu, J.; Cristea, A.; Long, Y. Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 3928–3936. [Google Scholar]
Huang, L.; Yu, W.; Ma, W.; Zhang, Y.; Li, S.; Liu, J. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
Yang, R.; Tan, T.F.; Lu, W.; Thirunavukarasu, A.J.; Ting, D.S.W.; Liu, N. Large Language Models in Health Care: Development, Applications, and Challenges. Health Care Sci. 2023, 2, 255–263. [Google Scholar] [CrossRef]
Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef]
Singhal, P.; Walambe, R.; Ramanna, S.; Kotecha, K. Domain Adaptation: Challenges, Methods, Datasets, and Applications. IEEE Access 2023, 11, 6973–7020. [Google Scholar] [CrossRef]
Li, J.; Yu, Z.; Du, Z.; Zhu, L.; Shen, H. A Comprehensive Survey on Source-Free Domain Adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5743–5762. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Sun, H.; Li, J.; Liu, R.; Li, Y.; Liu, Y.; Gao, Y.; Huang, H. MindLLM: Lightweight Large Language Model Pre-Training, Evaluation and Domain Application. AI Open 2024, 5, 155–180. [Google Scholar] [CrossRef]
Hu, L.; Liu, Z.; Zhao, Z.; Hou, L.; Nie, L.; Li, J. A Survey of Knowledge Enhanced Pre-Trained Language Models. IEEE Trans. Knowl. Data Eng. 2023, 6, 1413–1430. [Google Scholar] [CrossRef]
Ding, N.; Qin, Y.; Yang, G.; Wei, F.; Yang, Z.; Su, Y.; Hu, S.; Chen, Y.; Chan, C.-M.; Chen, W. Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models. Nat. Mach. Intell. 2023, 5, 220–235. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Halgamuge, M.N. Leveraging Deep Learning to Strengthen the Cyber-Resilience of Renewable Energy Supply Chains: A Survey. IEEE Commun. Surv. Tutor. 2024, 26, 2146–2175. [Google Scholar] [CrossRef]
Wu, M.; Subramaniam, G.; Zhu, D.; Li, C.; Ding, H.; Zhang, Y. Using Machine Learning-Based Algorithms to Predict Academic Performance—A Systematic Literature Review. In Proceedings of the 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM), Noida, India, 21–23 February 2024; pp. 1–8. [Google Scholar] [CrossRef]
Yang, F.; Li, X.; Duan, H.; Xu, F.; Huang, Y.; Zhang, X.; Long, Y.; Zheng, Y. MRL-Seg: Overcoming Imbalance in Medical Image Segmentation with Multi-Step Reinforcement Learning. IEEE J. Biomed. Health Inform. 2024, 28, 858–869. [Google Scholar] [CrossRef]
Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv 2024, arXiv:2403.14608. [Google Scholar]
Zhang, Y.; Wang, L.; Zhong, G. Design and Analysis of a Variable-Parameter Noise-Tolerant ZNN for Solving Time-Variant Nonlinear Equations and Applications. Appl. Intell. 2025, 55, 460. [Google Scholar] [CrossRef]
Wang, J.; Sun, Q.; Li, X.; Gao, M. Boosting Language Models Reasoning with Chain-of-Knowledge Prompting. arXiv 2023, arXiv:2306.06427. [Google Scholar]
Xu, L.; Zhang, J.; Li, B.; Wang, J.; Cai, M.; Zhao, W.X.; Wen, J.-R. Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis. arXiv 2024, arXiv:2401.04997. [Google Scholar]
Petruzzelli, A.; Musto, C.; Laraspata, L.; Rinaldi, I.; de Gemmis, M.; Lops, P.; Semeraro, G. Instructing and Prompting Large Language Models for Explainable Cross-Domain Recommendations. In Proceedings of the 18th ACM Conference on Recommender Systems, Bari, Italy, 14–18 October 2024; pp. 298–308. [Google Scholar]
Li, X.; Peng, S.; Yada, S.; Wakamiya, S.; Aramaki, E. GenKP: Generative Knowledge Prompts for Enhancing Large Language Models. Appl. Intell. 2025, 55, 464. [Google Scholar] [CrossRef]
Liu, J.M.; Li, D.; Cao, H.; Ren, T.; Liao, Z.; Wu, J. ChatCounselor: A Large Language Models for Mental Health Support. arXiv 2023, arXiv:2309.15461. [Google Scholar]
Yang, S.; Zhao, H.; Zhu, S.; Zhou, G.; Xu, H.; Jia, Y.; Zan, H. Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue. Proc. AAAI Conf. Artif. Intell. 2024, 38, 19368–19376. [Google Scholar] [CrossRef]
Xiong, H.; Wang, S.; Zhu, Y.; Zhao, Z.; Liu, Y.; Huang, L.; Wang, Q.; Shen, D. DoctorGLM: Fine-Tuning Your Chinese Doctor Is Not a Herculean Task. arXiv 2023, arXiv:2304.01097. [Google Scholar]
Zhang, H.; Chen, J.; Jiang, F.; Yu, F.; Chen, Z.; Li, J.; Chen, G.; Wu, X.; Zhang, Z.; Xiao, Q. HuaTuoGPT: Towards Taming Language Model to Be a Doctor. arXiv 2023, arXiv:2305.15075. [Google Scholar]
Dai, Y.; Feng, D.; Huang, J.; Jia, H.; Xie, Q.; Zhang, Y.; Han, W.; Tian, W.; Wang, H. LAiW: A Chinese Legal Large Language Models Benchmark (A Technical Report). arXiv 2023, arXiv:2310.05620. [Google Scholar]
Zhang, X.; Yang, Q. Xuanyuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4435–4439. [Google Scholar]
Yang, H.; Liu, X.-Y.; Wang, C.D. FinGPT: Open-Source Financial Large Language Models. arXiv 2023, arXiv:2306.06031. [Google Scholar] [CrossRef]
Dan, Y.; Lei, Z.; Gu, Y.; Li, Y.; Yin, J.; Lin, J.; Ye, L.; Tie, Z.; Zhou, Y.; Wang, Y. EduChat: A Large-Scale Language Model-Based Chatbot System for Intelligent Education. arXiv 2023, arXiv:2308.02773. [Google Scholar]
Luo, Y.; Zhang, J.; Fan, S.; Yang, K.; Wu, Y.; Qiao, M.; Nie, Z. BiomedGPT: Open Multimodal Generative Pre-Trained Transformer for Biomedicine. arXiv 2023, arXiv:2308.09442. [Google Scholar]
Arora, A.; Jurafsky, D.; Potts, C. CausalGym: Benchmarking Causal Interpretability Methods on Linguistic Tasks. arXiv 2024, arXiv:2402.12560. [Google Scholar]
Zeng, J.; Wang, R. A Survey of Causal Inference Frameworks. arXiv 2022, arXiv:2209.00869. [Google Scholar]
Xie, B.; Chen, Q.; Wang, Y.; Zhang, Z.; Jin, X.; Zeng, W. Graph-Based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models. arXiv 2024, arXiv:2407.18999. [Google Scholar]
Xu, Z.; Ichise, R. FinCaKG-Onto: The Financial Expertise Depiction via Causality Knowledge Graph and Domain Ontology. Appl. Intell. 2025, 55, 4617. [Google Scholar] [CrossRef]
Mu, F.; Li, W. A Causal Approach for Counterfactual Reasoning in Narratives. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 6556–6569. [Google Scholar]
Miao, X.; Li, Y.; Qian, T. Generating Commonsense Counterfactuals for Stable Relation Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 6–10 December 2023; pp. 5654–5668. [Google Scholar]
Zhang, C.; Zhang, L.; Wu, J.; Zhou, D.; He, Y. Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment. arXiv 2024, arXiv:2401.09042. [Google Scholar] [CrossRef]
Farahmand, F. Commonsense for AI: An Interventional Approach to Explainability and Personalization. AI Soc. 2024, 39, 3673–3681. [Google Scholar] [CrossRef]
Viswanathan, V.; Zhao, C.; Bertsch, A.; Wu, T.; Neubig, G. Prompt2Model: Generating Deployable Models from Natural Language Instructions. arXiv 2023, arXiv:2308.12261. [Google Scholar]
Shi, Z.; Lipani, A. DEPT: Decomposed Prompt Tuning for Parameter-Efficient Fine-Tuning. arXiv 2023, arXiv:2309.05173. [Google Scholar]
Li, X.; Lin, L.; Wang, S.; Qian, C. Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models. arXiv 2023, arXiv:2311.08213. [Google Scholar]
Wu, T.; Fan, Z.; Liu, X.; Zheng, H.-T.; Gong, Y.; Jiao, J.; Li, J.; Guo, J.; Duan, N.; Chen, W. AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation. Adv. Neural Inf. Process. Syst. 2023, 36, 39957–39974. [Google Scholar]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]

Figure 1. The conventional approach to acquiring knowledge of the target domain involves training on a substantial amount of unlabeled data and updating the model’s parameters. In contrast, our approach utilizes a limited amount of labeled data to identify causal relationships between data elements, enhancing the model’s ability to learn domain-specific knowledge. Rather than training the model’s parameters, we incorporate a domain “shell” into the model, enabling efficient adaptation to the domain of expertise.

Figure 2. The Framework of CPEL: In the MPEFT Module, the base model undergoes domain adaptation through two stages of efficient parameter fine-tuning. Rather than modifying the parameters of the base large language model (LLM) directly, the approach enhances the model’s overall capabilities by integrating a domain-specific shell around the Base LLM, thereby reducing dependency on hardware resources during training. In the Causal Prompt Generator, potential causal relationships within the domain dataset are identified. Based on the resulting insights, causal prompts are generated, thereby enriching the model’s understanding of domain-specific knowledge, which is further refined through the MPEFT module. Together, these two core modules within the CPEL framework enable the efficient and rapid adaptation of the Base LLM to specialized domains.

Figure 3. Model evaluation using BLEU and ROUGE. The top figure shows overall performance. The bottom two figures illustrate the impact of LoRA layers and prompt tuning, respectively.

Figure 4. Manual evaluation pipeline of the CPEL model in geriatric care. Questions are constructed by domain experts, answered by the model, and reviewed by expert assessors to validate task relevance and clinical accuracy. This process supplements automatic metrics with human judgments.

Table 1. Domain-Adaptive Training Result.

Base Model	Method	Trainable Params	Train Time (h:m:s)	LLM Score (0–9)
LLaMA3.2-1B	LoRA	1.02 M	2:37:42	4.3
LLaMA3.2-3B	LoRA	3.07 M	5:29:01	4.4
LLaMA2-7B	LoRA	7.18 M	8:40:39	3.2
LLaMA3.1-8B	LoRA	8.20 M	8:53:18	6.2

Table 2. Results of Downstream Task-Instruction Fine-Tuning.

Model	ROUGE (%)	BLEU-4 (%)	LLM Score (0–9)
DeepSeek-R1-Distill-Qwen-7B
+ Zero-shot	8.49	3.46	4.64
+ Zero-shot *	14.51 (+6.02)	13.30 (+9.84)	5.47 (+0.83)
Gemini-pro
+ Zero-shot	10.56	4.21	5.01
+ Zero-shot *	14.63 (+3.80)	13.82 (+9.61)	5.98 (+0.97)
GPT-4o-mini
+ Zero-shot	16.09	9.08	5.33
+ Zero-shot *	20.42 (+4.33)	15.48 (+6.40)	6.74 (+1.41)
LLaMA-70B
+ Zero-shot	11.74	5.08	4.92
+ Zero-shot *	16.34 (+4.60)	10.37 (+5.29)	5.48 (+0.56)
LLaMA-8B	8.60	3.02	4.00
Caring LLaMA-LoRA	12.25 (+3.65)	5.21 (+2.19)	5.14 (+1.14)
Downstream LLaMA-LoRA	29.41 (+17.16)	20.91 (+15.7)	6.57 (+1.43)
Downstream LLaMA-LoRA *	34.96 (+5.55)	25.41 (+4.50)	7.06 (+0.49)

“*” denotes the use of the proposed causal prompt in the corresponding model configuration.

Table 3. Case-Study Responses from Different AI Models.

Q1: In the early morning, an old man suffering from Alzheimer’s disease got up early and sat on the bedside chair, looking out of the window. At this time, the nursing staff came in. What should be done next?

AI (ChatGPT): The nurse approached softly and greeted the old man with a smile: “Good morning, did you sleep well?” Then gently guided the old man out of bed, helped him to groom and prepare breakfast.

AI (Gemini): Nursing staff should help the agedly start the day safely and smoothly through cordial communication, clear guidance, and careful care.

AI (WenXinYiYan): The nursing staff would first engage in a brief conversation with the old man to understand his emotions and needs. They would then help him tidy up the bed, check his physical condition to ensure safety and wellbeing, and assist with daily activities such as washing up, dressing, or preparing breakfast.

AI (Geriatric Care LLaMA): Walk into the room gently, avoiding loud noises so as not to scare the agedly. Approach, crouch down, and talk in a soft, friendly tone. While communicating, observe the emotional state, posture, and eye contact to assess willingness to engage. If the person shows confusion or discomfort, first ask about their condition and whether they miss their children, and offer comfort and encouragement. Once emotionally stable and willing to cooperate, suggest washing and gargling. When aiding gargling, prepare toiletries with appropriate water temperature. Initially provide assistance, encourage imitation, and promote self-confidence and sense of participation.

Q2: At 11:30 a.m., an agedly man lying in bed with aphasia suddenly stuttered and spoke unclearly. The nursing staff happened to pass by and noticed. How should this be handled?

AI (ChatGPT): The nursing staff should immediately assess the state of consciousness, check respiration and pulse, keep the agedly lying flat, and call emergency services. Monitor vital signs and avoid movement or feeding to prevent asphyxia.

AI (Gemini): The priority is to initiate an emergency call. Press the emergency button and communicate briefly and clearly with the agedly to assess the situation. Keep them calm and observe respiration and consciousness until medical staff arrive. Use pen, paper, or gestures to understand needs.

AI (WenXinYiYan): If speech is unclear, nursing staff should observe closely and notify the medical team to ensure airway safety and prevent asphyxia. While waiting for the team, soothe the patient, assist with follow-up examinations, and document the event.

AI (Geriatric Care LLaMA): If the health status is stable, consider other causes. At 11:30, it may be lunchtime for the agedly. Consider if they made noise due to hunger, and patiently guide communication. Closely monitor their mood and state to understand intentions.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, J.; Gao, J.; Liu, X.; Yang, G.; Liu, J.; Long, Y.; Huang, Z.; Yang, K. CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond. Mathematics 2025, 13, 2460. https://doi.org/10.3390/math13152460

AMA Style

Xu J, Gao J, Liu X, Yang G, Liu J, Long Y, Huang Z, Yang K. CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond. Mathematics. 2025; 13(15):2460. https://doi.org/10.3390/math13152460

Chicago/Turabian Style

Xu, Jinzhong, Junyi Gao, Xiaoming Liu, Guan Yang, Jie Liu, Yang Long, Ziyue Huang, and Kai Yang. 2025. "CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond" Mathematics 13, no. 15: 2460. https://doi.org/10.3390/math13152460

APA Style

Xu, J., Gao, J., Liu, X., Yang, G., Liu, J., Long, Y., Huang, Z., & Yang, K. (2025). CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond. Mathematics, 13(15), 2460. https://doi.org/10.3390/math13152460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond

Abstract

1. Introduction

2. Related Works

2.1. The Specialized Field Large Language Model

2.2. Causal Representation Learning in Large Language Models

2.3. Research on Model Domain Adaptation Strategies

3. Methodology

3.1. Problem Statement

3.1.1. Step 1: The Learning Phase—Domain Adaptation

3.1.2. Step 2: The Fine-Tuning Stage—Instruction in Downstream Tasks

3.1.3. Definitions of Tasks Related to Geriatric Care

3.2. Causality-Aware Parameter-Efficient Learning Framework

3.2.1. MPEFT Module

3.2.2. Causal Prompt Generator

4. Experiment

4.1. Dataset

4.2. Domain-Adaptive Training

4.2.1. Experimental Setup

4.2.2. Experimental Results

4.2.3. Experimental Analysis

4.3. Downstream Task-Instruction Fine-Tuning

4.3.1. Experimental Results

4.3.2. Experimental Analysis

4.4. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI