A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation

Dong, Yuxin; Wang, Ruotong; Liu, Guiran; Zhu, Binrong; Cheng, Xiaohan; Gao, Zijun; Feng, Pengbin

doi:10.3390/math13193201

Open AccessArticle

A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation

by

Yuxin Dong

^1,†,

Ruotong Wang

^2,†,

Guiran Liu

³

,

Binrong Zhu

³,

Xiaohan Cheng

⁴,

Zijun Gao

⁵ and

Pengbin Feng

^6,*

¹

School of Business, Wake Forest University, Winston-Salem, NC 27109, USA

²

Department of Computer Science, Rutgers University, Piscataway, NJ 08901, USA

³

College of Science & Engineering (CoSE), San Francisco State University, San Francisco, CA 94132, USA

⁴

D’Amore-McKim School of Business, Northeastern University, Boston, MA 02115, USA

⁵

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA

⁶

Department of Mathematics, University of Southern California, Los Angeles, CA 90007, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(19), 3201; https://doi.org/10.3390/math13193201

Submission received: 18 July 2025 / Revised: 28 August 2025 / Accepted: 19 September 2025 / Published: 6 October 2025

(This article belongs to the Special Issue Privacy-Preserving Machine Learning in Large Language Models (LLMs))

Download

Browse Figures

Versions Notes

Abstract

This paper focuses on the efficient fine-tuning of large language models within the federated learning framework. To address the performance bottlenecks caused by multi-source heterogeneity and structural inconsistency, a structure-aware federated fine-tuning method is proposed. The method incorporates a graph representation module (GRM) to model internal structural relationships within text and employs a segmentation mechanism (SM) to reconstruct and align semantic structures across inputs, thereby enhancing structural robustness and generalization under non-IID (non-Independent and Identically Distributed) settings. During training, the method ensures data locality and integrates structural pruning with gradient encryption (SPGE) strategies to balance privacy preservation and communication efficiency. Compared with representative federated fine-tuning baselines such as FedNLP and FedPrompt, the proposed method achieves consistent accuracy and F1-score improvements across multiple tasks. To evaluate the effectiveness of the proposed method, extensive comparative experiments are conducted across tasks of text classification, named entity recognition, and question answering, using multiple datasets with diverse structures and heterogeneity levels. Experimental results show that the proposed approach significantly outperforms existing federated fine-tuning strategies on most tasks, achieving higher performance while preserving privacy, and demonstrating strong practical applicability and generalization potential.

Keywords:

federated learning; large language models; privacy preservation; graph representation learning; structure-aware fine-tuning

MSC:

68T07

1. Introduction

With the rapid proliferation of large language models (LLMs) across diverse natural language processing tasks, ensuring model generalization in a privacy-preserving manner has emerged as a critical challenge in both academia and industry [1,2]. In many real-world applications, data are distributed across multiple clients or organizations, each possessing sensitive and proprietary information that cannot be directly shared. Federated learning (FL) provides a distributed collaborative training paradigm in which models are updated locally and aggregated globally without exposing raw data, thereby offering a natural foundation for deploying LLMs in privacy-sensitive environments [3].

However, practical FL deployments—especially those involving large-scale LLMs—are rarely homogeneous. Client datasets often exhibit substantial heterogeneity in both semantic and structural forms, manifesting as variations in label space, linguistic style, domain-specific terminology, and latent data organization patterns. This is non-independent and identically distributed (non-IID) (non-independent and identically distributed; a setting where client data distributions differ significantly in label space, feature space, or both) nature leads to two fundamental challenges: semantic drift (a phenomenon where the semantic meaning of learned representations shifts across training rounds or clients, leading to inconsistent interpretation), where semantically similar content is encoded inconsistently across clients, and structural inconsistency (mismatches in model component structures or feature organization among clients that hinder coherent aggregation), where the model’s ability to generalize structural patterns across domains is compromised. These challenges are further amplified by the scale and complexity of LLMs, which intensify communication overhead, exacerbate knowledge drift, and increase susceptibility to local overfitting [4].

Existing federated fine-tuning methods have attempted to address heterogeneity through parameter-efficient adaptation strategies such as LoRA and adapter-based tuning. While these approaches reduce communication costs, they often rely on unified global aggregation without explicitly modeling the interplay between semantic alignment (ensuring that semantically similar content from different clients is encoded into consistent latent representations) and structural generalization, resulting in suboptimal performance for structurally diverse tasks or cross-domain transfer scenarios [5]. Moreover, current strategies typically treat privacy mechanisms and model optimization as separate processes, overlooking their joint influence on representation learning under heterogeneous conditions. Unlike recent LoRA- and adapter-based methods that primarily reduce communication by constraining updates to fixed low-rank or adapter modules, our framework adaptively segments the LLM (Large Language Model) based on functional dependencies and models high-order semantic relations across clients. This joint design preserves communication efficiency while enhancing cross-client semantic consistency and structural alignment, thereby better mitigating knowledge drift and overfitting in heterogeneous federated settings. Graph representation is particularly suitable in this context as it naturally encodes complex inter-client and inter-task relationships, providing a unified abstraction for jointly achieving semantic alignment and structural generalization.

To overcome these limitations, this paper proposes a novel federated fine-tuning framework that integrates Graph Representation Learning and a Structural Segmentation Framework. The graph representation component models higher-order relationships between clients and tasks, enabling the capture of global semantic dependencies while respecting local privacy constraints. The structural segmentation component decomposes the LLM into functionally distinct partitions, allowing fine-grained parameter updates that align structural patterns across heterogeneous clients. This joint design explicitly mitigates gradient interference (addressing structural inconsistency), enhances semantic consistency (addressing semantic drift), and improves generalization in multi-task and cross-domain FL settings.

The main contributions of this work are threefold, with each contribution explicitly linked to the challenge it addresses: (1) We design a structure-aware graph representation module that explicitly models structural dependencies in text, enhancing the consistency and generalization of semantic representations in FL, thereby mitigating semantic drift. (2) We propose a cross-client structural segmentation mechanism that reconstructs a unified structural space based on local semantics, effectively reducing structural inconsistency and improving multi-task transferability in heterogeneous settings. (3) We conduct comprehensive evaluations across multiple structurally diverse datasets, covering tasks such as text classification, named entity recognition, and question answering, demonstrating improvements in both performance and efficiency, which validates the framework’s ability to address cross-domain generalization challenges.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the proposed framework. Section 4.4 describes the experimental setup, and Section 5.1 reports and analyzes the results. Section 8 concludes the paper.

2. Related Work

2.1. Privacy Preservation in Large Language Models

With the widespread adoption of large language models (LLMs) in generative intelligence, code assistance, and security analysis, privacy leakage during training and inference has raised growing concerns in both academia and industry. Existing studies have systematically summarized privacy attacks and defense mechanisms for LLMs, including membership inference, reconstruction attacks, and training data leakage. Techniques such as obfuscation, differential privacy, and parameter pruning have also been categorized and analyzed [6,7,8]. Das et al. [6] provided a systematic review of privacy challenges in LLMs, emphasizing multi-source threats throughout data collection, fine-tuning, and deployment. Li et al. [7] proposed a closed-loop framework covering attack, defense, and future directions. They highlighted the potential of model compression, parameter pruning, and interpretability mechanisms for privacy protection.

To address inference-time privacy issues, Tong et al. [9] introduced the InferDPT framework. It integrates encrypted collaborative mechanisms in black-box LLM settings, aligning user inputs with model responses to mitigate semantic leakage. By injecting noise and applying encoding substitutions, this approach protects highly sensitive prompt templates without sacrificing model performance. Kandpal et al. [10] discovered that duplicate samples in training data significantly amplify privacy risks. They proposed a sample deduplication strategy for pretraining, offering a new perspective on structural privacy control.

At the application level, researchers have explored lightweight privacy-enhancing methods in edge intelligence and IoT scenarios. Ferrag et al. [11] developed an intrusion detection model based on the BERT (Bidirectional Encoder Representations from Transformers) architecture. This model embeds a differential privacy mechanism and utilizes localized attention optimization to improve threat detection under limited data. Shen et al. [12] combined LLMs with autonomous edge computing systems to build a privacy-aware collaborative framework for connected intelligence. Their results show the feasibility and robustness of structured context control in dynamic networks. Overall, existing work has established both theoretical and methodological foundations for LLM privacy protection. However, further improvements are needed in balancing privacy expression and information fidelity under federated learning and structural disentanglement frameworks [13,14].

Overall, privacy risks in LLMs directly affect the stability and efficiency of federated training. These constraints create specific optimization challenges that must be addressed in distributed model design.

2.2. Application of Federated Learning in Large Language Models

Building on the privacy challenges outlined in Section 2, federated learning offers a natural paradigm to mitigate such risks while enabling large-scale collaborative training. With the widespread deployment of large language models (LLMs) in sensitive domains such as healthcare, finance, and public administration, federated learning (FL) has emerged as a key strategy to enhance the deployability and controllability of LLMs due to its privacy-preserving aspect of keeping data local.

With the widespread deployment of large language models (LLMs) in sensitive domains such as healthcare, finance, and public administration, federated learning (FL) has emerged as a key strategy to enhance the deployability and controllability of LLMs due to its privacy-preserving aspect of keeping data local. To support systematic development of LLMs under the federated paradigm, Kuang et al. [15] proposed FederatedScope-LLM, a multi-stage, multi-module toolkit for federated fine-tuning of large models. It covers critical components such as LoRA integration, optimization control, and collaboration across heterogeneous clients. Ye et al. [16] addressed the challenges of decentralized deployment by introducing the OpenFedLLM framework. It incorporates a cross-device attention synchronization mechanism to enable effective adaptation of LLMs to heterogeneous private datasets, laying a system-level foundation for privacy-enhanced model training.

To overcome communication and storage bottlenecks in federated LLMs, researchers have explored parameter-efficient tuning strategies. Jiang et al. [17] proposed a low-parameter fine-tuning mechanism that compresses full gradient propagation into shared semantic blocks, significantly reducing communication costs. Che et al. [18] further integrated prompt tuning with adaptive optimization algorithms to enable personalized adaptation of LLMs across cross-domain clients, improving performance on resource-constrained devices. In addition, Liu et al. [19] embedded differential privacy into the LoRA tuning process. By introducing controlled noise perturbation, their method protects user input privacy and validates the security and robustness of a combined strategy of parameter pruning and privacy injection.

In real-world industrial deployment, Fan et al. [20] launched FATE-LLM, a federated learning framework for LLMs on real multi-source, multi-domain data. It supports asynchronous multi-task optimization, long-text alignment, and inference path tracking. Yao et al. [21] conducted a comprehensive review of current federated LLM development and highlighted key future directions. These include modeling data heterogeneity adaptation, improving multi-client collaboration efficiency, and unifying privacy risk quantification. Gupta et al. [22] revealed through empirical analysis that conventional federated language modeling can reconstruct private texts when left unprotected. This further emphasizes the need for structural defense mechanisms during the fine-tuning of LLMs.

2.3. Limitations and Motivation

Despite these advancements, existing approaches still face several notable limitations. First, many methods overlook the joint impact of privacy constraints and distributed optimization challenges, resulting in insufficient convergence stability and limited generalization in heterogeneous environments. Second, most privacy-preserving strategies either rely on fixed noise injection mechanisms or adopt heuristic communication schemes, which fail to adapt to non-IID data distributions and dynamic network conditions in real-world deployments. Third, the interplay between privacy mechanisms and federated model updates is often treated independently, without explicitly modeling their mutual influence on representation learning, leading to semantic drift and structural inconsistency across clients. These limitations motivate the development of a unified framework that integrates Graph Representation Learning to capture high-order dependencies and global structural patterns, together with a Structural Segmentation Framework to enhance semantic consistency and cross-task generalization under privacy-preserving constraints. This design directly addresses the challenges of privacy leakage risks, heterogeneous optimization constraints, and semantic–structural alignment in large-scale federated language model training. The comparison summary is shown in Table 1.

3. Method

This section elaborates on the design principles and technical details of the proposed framework, which explicitly distinguishes itself from existing federated fine-tuning approaches in three key aspects:

(1): Graph Representation Learning: by leveraging a graph-based representation space to model inter-client and intra-task relationships, enabling more precise and structure-aware knowledge sharing across heterogeneous data distributions.
(2): Structural Segmentation Framework: by partitioning model components according to their functional roles and sensitivity levels, enabling selective parameter updates that enhance task-specific adaptation and mitigate gradient interference.
(3): Integrated optimization pipeline: by combining graph representation learning with the structural segmentation framework, the proposed approach establishes a coordinated optimization process that reduces negative transfer, accelerates convergence, and improves generalization.

The following subsections detail the overall architecture, graph construction process, segmentation mechanism, and the joint optimization algorithm. An end-to-end schematic diagram of the proposed training workflow is shown in Figure 1, illustrating the interactions among all modules and the flow of data and optimization signals throughout the process.

3.1. Overall Model Architecture

This paper proposes a federated fine-tuning framework for large language models in privacy-sensitive scenarios. For the first time, cross-client gradient updates are explicitly mapped to a global interaction graph (The global interaction graph is introduced to address the challenge of fragmented and locally-biased representations under heterogeneous data distributions. It explicitly models high-order dependencies across clients, enabling consistent knowledge alignment and improving global semantic coherence). A linear alignment module and a graph representation learning component are introduced to capture high-order dependencies under heterogeneous distributions. Based on this structure, we design a graph-importance-driven structural pruning (Structural pruning reduces communication and computation bottlenecks by removing redundancy and retaining critical structures) strategy. Only sub-blocks with the highest semantic relevance are unfrozen, balancing parameter efficiency and communication compression. A hierarchical secure channel (The hierarchical secure channel mitigates multi-level privacy risks and adapts to network heterogeneity while ensuring secure transmission) and a minimum exposure gradient embedding mechanism are also integrated to mitigate potential privacy leakage risks.

The proposed framework unifies graph learning, scalable federated optimization, and LoRA-based fine-tuning. It provides a low-barrier solution for large model customization on resource-constrained devices. This design also lays the groundwork for future studies on high-fidelity knowledge sharing in distributed settings. Low-Rank Adaptation (LoRA) is adopted in our framework to enable efficient fine-tuning of large-scale language models in resource-constrained federated learning environments. LoRA decomposes the weight update into a pair of low-rank matrices, which are trained while keeping the original pre-trained weights frozen. This approach significantly reduces the number of trainable parameters and memory footprint while preserving the expressive power of the base model. In our scenario, LoRA is particularly suitable because each federated client often has limited computational capacity and communication bandwidth. By applying LoRA to selected model layers, we achieve efficient parameter updates with minimal communication cost, making the method well-suited for cross-client training where privacy, scalability, and deployment efficiency are critical. The overall architecture is illustrated in Figure 1. The proposed framework first constructs a graph-based representation space to capture structural relationships among clients and then applies the structural segmentation framework to selectively update parameters. The arrows in the figure depict the direction of information flow, while the color coding differentiates between communication, local optimization, and auxiliary operations. This design facilitates both structure-aware knowledge sharing and task-specific adaptation.

The proposed framework integrates graph representation learning and structural segmentation to address both semantic consistency and heterogeneous adaptation in federated large language model (LLM) fine-tuning. The architecture consists of three main components: (1) graph representation learning to capture high-order semantic dependencies across distributed clients, (2) structural segmentation to adapt model components to heterogeneous environments, and (3) optimization strategies for privacy preservation and communication efficiency. The global server coordinates the training process, while client-specific graphs encode the structural and semantic characteristics of local data. Details of the graph representation process are provided in the following subsection.

3.2. Graph Representation Learning

To enhance representation learning in federated semantic alignment scenarios, this paper introduces a graph representation learning module as a structural foundation. The input graph is denoted as

G = (V, E)

, where the node set is

V = {v_{1}, v_{2}, \dots, v_{N}}

and the edge set is

E \subseteq V \times V

. Here, each edge

e_{i j} \in E

represents the structural or semantic relationship between node

v_{i}

and node

v_{j}

, such as feature similarity, dependency links, or interaction frequency, enabling the model to capture both local and global dependencies. Each node has a feature vector

x_{i} \in R^{d}

. For clarity, we provide a brief example illustrating the text-to-graph transformation. Consider the sentence “Privacy constraints influence model optimization in federated learning”. Each content-bearing word or recognized named entity is treated as a node, such as privacy constraints, model optimization, and federated learning. Edges are established based on syntactic dependencies and semantic co-occurrence, for instance linking privacy constraints to model optimization due to the causal relation implied in the sentence. This representation preserves both local phrase-level structures and global semantic dependencies, facilitating downstream structural segmentation and alignment. The initial node embeddings are obtained through the following mapping:

h_{i}^{(0)} = MLP (x_{i})

(1)

where

x_{i} \in R^{d}

denotes the raw feature vector of node i.

MLP (\cdot)

is a standard multi-layer perceptron used to project raw features into the initial embedding space. The same MLP (Multi-Layer Perceptron) parameters are shared across all nodes and applied independently to each

x_{i}

, so Equation (1) represents the per-node form of the mapping. In practice, all node features can be stacked into a matrix

X \in R^{N \times d}

and processed in parallel to obtain

H^{(0)} \in R^{N \times d_{0}}

.

After initialization, the model captures local topological information by applying multi-layer message passing. Node states are updated using the structural neighborhood. The propagation mechanism at the l-th layer is defined as

m_{i}^{(l)} = \sum_{j \in N (i)} \frac{1}{\sqrt{d_{i} d_{j}}} W^{(l)} h_{j}^{(l - 1)}

(2)

Derivation of Equation (2). Let

A \in R^{N \times N}

be the adjacency matrix of

G = (V, E)

and

D = diag (d_{1}, \dots, d_{N})

be the degree matrix with

d_{i} = \sum_{j} A_{i j}

. Adding self-loops yields

\tilde{A} = A + I

and

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

. Define the symmetrically normalized propagation matrix

S = {\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} .

A graph convolution/message-passing layer then takes the matrix form

H^{(l)} = σ (S H^{(l - 1)} W^{(l)}),

where

H^{(l - 1)} = [h_{1}^{(l - 1)}; \dots; h_{N}^{(l - 1)}]

. Expanding the i-th row gives

h_{i}^{(l)} = σ (\sum_{j \in N (i) \cup {i}} \frac{1}{\sqrt{{\tilde{d}}_{i} {\tilde{d}}_{j}}} W^{(l)} h_{j}^{(l - 1)}) .

If self-loops are omitted, replacing

(\tilde{A}, \tilde{D})

with

(A, D)

yields

h_{i}^{(l)} = σ (\sum_{j \in N (i)} \frac{1}{\sqrt{d_{i} d_{j}}} W^{(l)} h_{j}^{(l - 1)}),

which is Equation (2). The symmetric factor

\frac{1}{\sqrt{d_{i} d_{j}}}

arises from

D^{- 1 / 2} A D^{- 1 / 2}

and stabilizes feature propagation across nodes with different degrees (bounded operator norm), preventing high-degree nodes from dominating the aggregation.

h_{i}^{(l)} = σ (m_{i}^{(l)})

(3)

where

N (i)

is the set of neighbors of node i,

d_{i}

and

d_{j}

are their degrees,

W^{(l)}

is the learnable weight matrix at layer l, and

σ (\cdot)

is a nonlinear activation function.

After stacking multiple layers, the final node representation

z_{i}

is obtained, serving as the base of fine-grained structural encoding:

z_{i} = h_{i}^{(L)}

(4)

where

h_{i}^{(L)}

denotes the hidden state of node i at the final GNN (Graph Neural Network) layer indexed by L, serving as its learned representation. Here, l in Equation (4) is a generic layer index (

0 \leq l < L

), while L specifically denotes the total number of GNN layers, i.e., the index of the last layer whose output is used as the final embedding.

To encode graph-level semantics while capturing localized structural patterns, the constructed global interaction graph is further partitioned into multiple subgraphs through a structure-preserving clustering procedure. This partitioning enables the model to focus on semantically coherent regions within the global topology, enhancing representation granularity. Assume that the global graph G is divided into subgraphs

{G_{1}, G_{2}, \dots, G_{K}}

. The representation of each subgraph is computed as

g_{k} = READOUT ({z_{i} ∣ v_{i} \in G_{k}})

(5)

The

READOUT (\cdot)

function in Equation (5) serves the purpose of compressing node-level embeddings within a subgraph into a single, fixed-dimensional subgraph representation. This operation enables the model to capture high-level semantic information of the entire subgraph, which is crucial for subsequent multi-scale feature integration and task-specific prediction. Functionally,

READOUT (\cdot)

acts as a permutation-invariant pooling operator that ensures the aggregated result is independent of the ordering of nodes.

From an algorithmic perspective, common implementations include mean pooling, sum pooling, or max pooling. For instance, in the case of mean pooling, the representation of subgraph

G_{k}

is computed as

g_{k} = \frac{1}{| G_{k} |} \sum_{v_{i} \in G_{k}} z_{i},

where

| G_{k} |

is the number of nodes in subgraph

G_{k}

and

z_{i}

is the learned embedding of node i from the final GNN layer. Such aggregation condenses the distributed node information into a compact vector that preserves the global semantics of the subgraph.

Based on these aggregated features, a multi-scale structure fusion module is constructed to integrate embeddings from different granularity levels. The fused representation is defined as

g_{k}^{'} = \sum_{s = 1}^{S} α_{s} \cdot W_{s} g_{k}^{(s)}

(6)

The fused representation is fed into a structure-aware refinement module. A shallow mapping

ϕ

generates intermediate representations:

o_{i} = ϕ (g_{k}^{'})

(7)

where

g_{k}^{'}

is the fused subgraph embedding after multi-scale integration, and

ϕ (\cdot)

is a task-specific transformation function.

In the refinement stage, the output passes through multiple task-specific heads. Structural node embeddings are routed to task-specific branches to produce output embeddings:

y_{i}^{(t)} = f_{t} (o_{i})

(8)

where

f_{t} (\cdot)

is the prediction head for task t, mapping the shared representation

o_{i}

to the corresponding output space.

Formulas (6)–(8) together describe the forward propagation process of a basic neural network applied to subgraph representations. Specifically, Equation (6) serves as the input layer, where multi-scale subgraph embeddings

g_{k}^{(s)}

are integrated through learnable weights

W_{s}

and attention coefficients

α_{s}

. Equation (7) acts as the hidden layer, transforming the fused embedding

g_{k}^{'}

into a task-shared representation

o_{i}

via a nonlinear mapping

ϕ (\cdot)

. Finally, Equation (8) represents the output layer, where the shared representation

o_{i}

is projected into the output space of task t through the task-specific head

f_{t} (\cdot)

. This sequence illustrates how information flows from integrated subgraph features to task-specific predictions.

The final outputs are connected to the structural segmentation module. This builds a semantic bridge between structural partitioning and downstream tasks. It also provides a unified representation and structural discriminative basis for federated optimization. The model architecture diagram is shown in Figure 2. In the figure we can see, the left panel implements the Graph Representation Learning pipeline. The input client graph (green nodes with black edges) is first processed by a node feature encoder to obtain initial node states. A message passing unit aggregates neighborhood information, and a stacked GNN propagates multi-hop signals to produce node embeddings. The dashed feedback arrow indicates an optional refinement step in which the learned embeddings can adjust or regularize the input graph before the next round. The right panel corresponds to the Structural Segmentation Framework. The input graph is partitioned into structural subregions; features are aggregated region-wise and then fused across multiple scales to capture both local and global dependencies. The structure-aware fine-tuning module receives the fused representation and performs selective parameter updates on the large language model according to segment importance and sensitivity, while the remaining parts stay frozen to preserve general knowledge. Finally, the module outputs task-level representations (

o_{1}, o_{2}, o_{3}

), which are routed to task-specific parameter groups (a–f) and heads to serve downstream tasks (text classification, named entity recognition, sentiment analysis, and prompt completion). Solid arrows denote forward feature flow, whereas dashed arrows mark auxiliary or feedback links used for iterative refinement.

In this stage, each client constructs a semantic–structural graph from its local dataset, where nodes represent task-specific entities or feature tokens, and edges encode semantic correlations and structural dependencies. A graph neural network (GNN) backbone is then applied to capture high-order interactions, producing embeddings that preserve both semantic coherence and structural relationships. These embeddings are aggregated at the server to form a unified representation space that aligns cross-client semantics. The generated graph embeddings are subsequently leveraged in the structural segmentation process, enabling fine-grained adaptation across heterogeneous model components.

3.3. Structural Segmentation Framework

To enable adaptability of large language models to complex structural information and enhance local responsiveness, this paper proposes a structural segmentation enhancement mechanism. The model architecture diagram is shown in Figure 3. As can be seen from the figure, the upper pathway instantiates the Structural Segmentation Framework on top of the LLM backbone. Tokens are first organized into a lattice and then partitioned into structural segments. The bidirectional structure-aware interaction model updates hidden states along two complementary streams, enabling information to flow in both directions across adjacent segments. Contextual coefficients modulate each step so that segment interactions reflect local cues and long-range dependencies. The lower pathway performs segment-level embedding slicing. Layerwise token embeddings are grouped according to the segmentation map, refined within each segment, and aligned across neighboring segments. Bidirectional arrows indicate that the slicing and refinement steps can be iterated until stable segment representations are obtained. The result is a compact segment representation

Z_{seg}

that preserves structure while reducing redundancy. The structure-aware fine-tuning pipeline then applies selective updates. A lightweight controller A configures two adapter-like parameter groups

ϕ_{1}

and

ϕ_{2}

. A structural transformation module

T_{struct}

enforces consistency with the segmentation, guiding the update path toward structure-preserving changes. Only the selected parameters are unfrozen, while the rest of the backbone remains fixed to retain general knowledge. The head produces the final prediction

\hat{y}

. Solid arrows denote the forward information flow between modules, and dashed boundaries indicate the scope of each functional block.

The output representations from the LLM backbone are partitioned into a set of structural segments

{S_{1}, S_{2}, \dots, S_{K}}

. Each segment corresponds to a subregion of consecutive tokens. Given a sequence of length T, the segment boundary is defined by

[t_{k}^{s}, t_{k}^{e}]

. The segmentation operation is expressed as

S_{k} = {x_{t_{k}^{s}}, x_{t_{k}^{s} + 1}, \dots, x_{t_{k}^{e}}}

(9)

where

S_{k}

denotes the token sequence belonging to the k-th structural segment, bounded by start index

t_{k}^{s}

and end index

t_{k}^{e}

.

Each structural segment

S_{k}

is passed into a structure-aware encoder for bidirectional interactive modeling. This forms upward and downward structural paths denoted by

h_{i}^{(1)}

and

h_{i}^{(2)}

, respectively. The contextual information is propagated through forward and backward GRU (Gated Recurrent Unit) as follows:

h_{i}^{(1)} = {GRU}^{(1)} (x_{i}, h_{i - 1}^{(1)})

(10)

where

{GRU}^{(1)}

processes the input sequence in the forward direction, updating hidden state

h_{i}^{(1)}

based on current token

x_{i}

and previous hidden state

h_{i - 1}^{(1)}

.

h_{i}^{(2)} = {GRU}^{(2)} (x_{i}, h_{i + 1}^{(2)})

(11)

where

{GRU}^{(2)}

processes the input sequence in the backward direction, updating hidden state

h_{i}^{(2)}

based on current token

x_{i}

and next hidden state

h_{i + 1}^{(2)}

.

The two paths are concatenated to form the joint structural representation

h_{i}

, which is used for segment-level information integration and module-wise embedding enhancement:

h_{i} = [h_{i}^{(1)} ∥ h_{i}^{(2)}]

(12)

where

[\cdot ∥ \cdot]

denotes concatenation, producing a bi-directional representation of token i.

Based on this structural modeling, a Segment-level Slicing mechanism is introduced. It selects structural feature regions from key blocks of the LLM backbone using a segment index set

I_{seg}

, producing multi-scale embedding representations:

Z_{seg} = Slice (M, I_{seg})

(13)

where

M

is the matrix of layerwise embeddings, and

I_{seg}

is the index set specifying which embeddings belong to each segment.

The sliced representation is projected by a nonlinear mapping function

ϕ_{1} (\cdot)

and then passed to the structure transformation module

T_{struct}

for enhancement:

h_{struct} = T_{struct} (ϕ_{1} (Z_{seg}))

(14)

where

ϕ_{1} (\cdot)

transforms the segment representation, and

T_{struct} (\cdot)

enforces structural consistency based on the segmentation map.

To further improve the precision of local structural consistency modeling, an auxiliary structure-aware module A is introduced and optimized in parallel with the main branch:

\tilde{h} = A (Z_{seg})

(15)

where

A (\cdot)

is a lightweight controller producing auxiliary parameters or adjustments from the segment embeddings.

The final representation is projected into the task-specific output space through

ϕ_{2}

, generating the prediction

\hat{y}

while preserving the granularity of structural segments:

\hat{y} = ϕ_{2} (h_{struct})

(16)

where

ϕ_{2} (\cdot)

maps the structure-aware representation to the final task prediction.

The structural segmentation path and the LLM backbone form a complementary interaction in the overall framework. This approach strengthens structural semantic granularity and provides a unified interface for multi-task structural transfer and personalized fine-tuning.

The structural segmentation framework decomposes the LLM into functional modules, such as attention layers, feed-forward blocks, and embedding layers, based on the semantic–structural cues provided by the graph embeddings. This decomposition allows selective fine-tuning of model segments according to client-specific resource constraints, data distributions, and privacy requirements. By aligning the segmentation strategy with the graph-derived semantic structure, the framework ensures that parameter updates are both computationally efficient and semantically consistent across clients.

3.4. Illustrative Example

To provide a more intuitive understanding of how Figure 1, Figure 2 and Figure 3 work together within the proposed framework, we present a simple illustrative example. Suppose the input is a small graph containing four nodes, each associated with a raw feature vector representing its attributes. In Figure 1, the nodes are first embedded into an initial feature space and organized into a global interaction graph. Based on structural proximity, this global graph is then partitioned into two subgraphs. In Figure 2, each subgraph undergoes multi-scale message passing and aggregation, resulting in compact subgraph-level embeddings. Finally, as depicted in Figure 3, the aggregated features are fed into task-specific output layers to produce the final prediction results. This step-by-step example demonstrates how the proposed model processes input data from global graph construction, through subgraph feature integration, to final task-specific outputs.

3.5. Training Objectives

To jointly optimize the graph representation and structural segmentation modules, we design an end-to-end differentiable training objective. The goal is to preserve semantic consistency while improving generalization under multi-source tasks. Specifically, the total loss consists of three parts: alignment loss

L_{align}

, segmentation consistency loss

L_{seg}

, and task-driven loss

L_{task}

. The total objective is defined as

L_{total} = λ_{1} L_{align} + λ_{2} L_{seg} + λ_{3} L_{task}

(17)

where

L_{total}

denotes the overall optimization objective of the proposed framework. It is composed of three components:

L_{align}

encourages embedding consistency among structurally connected nodes in the graph,

L_{seg}

measures the accuracy of the predicted structural segmentation against the ground truth, and

L_{task}

corresponds to the primary task-specific loss (e.g., cross-entropy for classification or mean squared error for regression). The non-negative coefficients

λ_{1}

,

λ_{2}

, and

λ_{3}

control the relative contribution of each term, enabling a balance between structural preservation, segmentation quality, and downstream task performance. By jointly optimizing these components, the model aligns structural information with task objectives while maintaining robust segment-level representations.

Choice of $λ_{1}, λ_{2}, λ_{3}$ :

We treat

λ_{1}, λ_{2}, λ_{3}

as hyperparameters and determine them via a small grid search on the validation splits of 20NEWS and SEMEVAL (RoBERTa backbone), under the constraint

λ_{1} + λ_{2} + λ_{3} = 1

. The search ranges are

{0.1, 0.2, 0.3, 0.4, 0.5}

with the heuristic

λ_{3} \geq λ_{1}, λ_{2}

to prioritize the task-driven objective. The best configuration is then fixed and reused across all tasks and backbones. This choice reflects the roles of

L_{align}

and

L_{seg}

as structural regularizers, while

L_{task}

drives the primary optimization target. We also observed that moderate perturbations (±0.1) to any

λ

do not change the relative ranking of methods.

For graph representation learning, we introduce the structural alignment loss

L_{align}

to ensure consistency within the global graph space. Let

v_{i}

and

v_{j}

be two semantically adjacent nodes, with embeddings

h_{i}

and

h_{j}

. The loss is defined as

L_{align} = \frac{1}{| E |} \sum_{(i, j) \in E} {∥h_{i} - h_{j}∥}_{2}^{2}

(18)

where

E

denotes the set of edges in the constructed client interaction graph, and

| E |

is the total number of such edges. For each connected pair of nodes

(i, j) \in E

,

h_{i}

and

h_{j}

represent their respective learned embedding vectors. This term computes the mean squared

ℓ_{2}

distance between connected node embeddings, encouraging semantically related or structurally linked clients to maintain similar representations. By minimizing this distance, the model aligns local feature spaces across clients, which promotes consistent structural semantics, reduces distributional shifts, and enhances the transferability and robustness of the learned graph representations in a federated environment.

For the structural segmentation module, we design a fine-grained consistency loss

L_{seg}

. It constrains structural boundaries across layers for the same semantic unit. Let the input sequence be divided into m structural fragments

{S_{1}, S_{2}, \dots, S_{m}}

, with representations

s_{k}

and decoder outputs

{\hat{s}}_{k}

. The loss is defined as

L_{seg} = \frac{1}{m} \sum_{k = 1}^{m} {∥{\hat{s}}_{k} - s_{k}∥}_{2}^{2}

(19)

where m denotes the number of structural segments within the input sequence or graph,

{\hat{s}}_{k}

represents the predicted embedding vector for the k-th segment generated by the model, and

s_{k}

is the corresponding ground-truth segment embedding. This term measures the mean squared

ℓ_{2}

distance between predicted and target segment embeddings, thereby assessing how accurately the model reconstructs the structural configuration in the semantic space. By minimizing this loss, the model learns to preserve fine-grained structural boundaries and semantic consistency across segments, which is particularly important when multi-scale information fusion might otherwise distort local representations. As a result,

L_{seg}

plays a key role in maintaining representation fidelity and ensuring that the structural segmentation framework produces semantically coherent and structurally accurate embeddings.

Finally, different

L_{task}

are instantiated per downstream task (e.g., cross-entropy for classification, token-level cross-entropy for NER, and span extraction losses for QA) to match task requirements.

3.6. Algorithm Description

This article also gives the pseudo code of the overall algorithm, as shown below (Algorithm 1).

Algorithm 1: Compact procedure for structure-aware federated fine-tuning with GRL & SSF

4. Dataset and Experimental Setup

4.1. Text Classification

To comprehensively validate the generalization ability and structural alignment performance of the proposed structure-aware federated fine-tuning framework under heterogeneous semantic distributions, this paper conducts experimental analysis on two representative benchmark datasets: 20Newsgroups [23] and SemEval-2010 Task 8 [24]. The 20Newsgroups dataset covers 20 semantic categories, including politics, religion, and technology, and exhibits significant cross-topic semantic transitions. This makes it a suitable platform for evaluating the Structural Segmentation Framework, as it requires capturing global semantics from long documents while mitigating interference between unrelated topics. The SemEval-2010 Task 8 dataset focuses on sentence-level classification of noun pair relations, containing 19 types of fine-grained semantic relationships. Its requirement for micro-structural alignment is highly similar to the causal and relational expressions found in financial texts. This aligns well with the Graph Representation Learning module, which models local semantic interactions and captures fine-grained relational dependencies across clients. Together, these two datasets cover challenges in both global structural modeling and fine-grained interaction modeling, providing a solid foundation for validating the proposed method’s transferability and convergence stability across multi-scale semantic structures.

4.2. Named Entity Recognition

To further evaluate the representation ability and cross-domain generalization performance of the proposed structural modeling method in fine-grained entity recognition tasks, this paper conducts experiments on two typical named entity recognition datasets: WNUT2017 [25] and PLONER [26] (Pseudo-Label Optimization with Noise Estimation and Regularization). The WNUT2017 dataset focuses on emerging entity recognition in user-generated content such as social media. It contains a large amount of noisy labels and non-standard text structures, with sparse label distribution and highly unstable semantic contexts. These characteristics create a strong challenge for model robustness in low-resource and high-variance environments. This setting is well-suited to evaluating the Graph Representation Learning module, which can leverage structural similarities across clients to improve recognition of rare and noisy entities. The PLONER dataset integrates multiple sources including CoNLL-2003, OntoNotes, and WNUT, covering a wide range of domain-specific entity types. It clearly reflects cross-domain heterogeneity, making it an ideal platform to test the Structural Segmentation Framework in handling multi-source semantic transfer. By selectively updating parameters relevant to domain-specific entities while preserving shared syntactic knowledge, SSF enhances stability under heterogeneous federated conditions. Through experiments on these two NER settings, this paper systematically explores how combining Graph Representation Learning and Structural Segmentation Framework enhances semantic boundary modeling and structural generalization in the federated fine-tuning process.

4.3. Questions and Answers

In the question answering task, to evaluate the transferability and adaptability of the proposed framework under long-text reasoning and multi-source QA contexts, this paper conducts comparative experiments on two representative QA datasets: SQuAD v1.1 [27] and MRQA [28] (Machine Reading for Question Answering). The SQuAD v1.1 dataset is built on full-text Wikipedia articles and focuses on the precise extraction of answer spans within paragraph-level contexts. It is suitable for testing the Structural Segmentation Framework, as the task requires identifying and isolating reasoning-relevant segments within long passages. By segmenting and selectively updating parameters tied to key contextual regions, SSF enhances logical consistency and improves information localization accuracy. The MRQA dataset serves as a benchmark for multi-source transfer QA, integrating heterogeneous QA subtasks such as NewsQA, HotpotQA, and SearchQA. It presents high variability in semantic style, passage length, and question structure, effectively simulating non-IID settings across clients in federated learning. This makes it ideal for assessing the Graph Representation Learning module, which can capture structural similarities in reasoning patterns across diverse domains, thereby improving cross-source adaptability. Through systematic experiments on these datasets, this paper verifies that the combination of Graph Representation Learning and Structural Segmentation Framework provides both robust structural alignment and effective adaptation in federated QA reasoning tasks.

4.4. Experimental Setup

In the proposed framework, the three components—text classification, named entity recognition (NER), and question answering (QA)—are conceptually organized in a sequential pipeline, where the output of one stage can, in certain application scenarios, serve as an input feature for the subsequent stage. However, in order to rigorously evaluate each component in isolation and ensure fair performance comparison with prior work, we conduct experiments on each task separately using task-specific benchmark datasets. Specifically, for text classification, we employ the 20NEWS and SEMEVAL datasets; for NER, we utilize the WNUT and PLONER datasets; and for QA, we adopt SQuAD v1.1 and MRQA. This task-specific dataset allocation allows us to: (1) leverage widely recognized benchmarks tailored to each task, (2) align our evaluation protocol with established baselines in the corresponding research area, and (3) avoid potential confounding effects caused by error propagation between tasks. Therefore, although the framework supports sequential integration of its components, the experimental evaluation is intentionally designed to be task-specific.

Based on this task-specific allocation, we implement all experiments within a unified multi-task federated learning framework as follows. To comprehensively verify the applicability and generalization performance of the proposed method in multi-task federated settings, we reproduce the experimental setup under a unified framework. The configuration covers key components including language model selection, training strategy, data processing, and communication protocol. We select two mainstream pretrained language models: RoBERTa-base and DeBERTaV3-base, which contain 125 M and 184 M parameters respectively. Both models are loaded and fine-tuned using the Huggingface Transformers library.

For data processing, we define task-specific maximum sequence lengths for text classification, named entity recognition, and question answering. To simulate realistic federated distributions, client-side data is partitioned using a Dirichlet distribution with

α = 0.1

to introduce non-IID settings. In training strategy, we adopt a low-resource configuration with local epoch set to 1 and batch size set to 16. After each round of training, only encrypted gradient embeddings are reported to the central server. The server performs model aggregation using the FedAvg algorithm and broadcasts the updated parameters.

A maximum of 1000 communication rounds is executed across all clients to ensure stable convergence under heterogeneous data distributions. Detailed experimental settings are presented in Table 2.

To ensure fair comparison, we introduce existing full fine-tuning (FFT) and parameter-efficient fine-tuning (PEFT) methods as baseline models under the federated learning setting. All experimental configurations follow their original implementations unless otherwise specified. The details of each baseline method are as follows:

5. Experiment

5.1. Comparative Experimental Results

Under the same experimental settings, the proposed structure-aware federated fine-tuning framework was compared against the baseline model (FedFT). Results show that our method consistently achieves higher performance across all evaluation datasets. Compared with FedFT, the proposed approach yields notable improvements in both accuracy and robustness, with the gains being more significant in scenarios involving non-IID and structurally heterogeneous data distributions. These advantages stem from the integration of Graph Representation Learning for global structural alignment and the Structural Segmentation Framework for fine-grained parameter adaptation. The comparison confirms that our method not only inherits the baseline’s generalization capability but also significantly strengthens task-specific adaptability and privacy robustness in federated optimization.

To further validate the effectiveness of the proposed framework, we compared it with the baseline model (FedFT) under identical experimental settings. Across all benchmark datasets, the proposed method consistently outperformed FedFT in terms of both accuracy-related and robustness-oriented metrics. The performance gains were particularly pronounced on tasks involving heterogeneous data distributions, demonstrating the advantages of integrating Graph Representation Learning for structure-aware knowledge sharing and the Structural Segmentation Framework for selective parameter updates. These results confirm that the proposed approach not only preserves the generalization capability of the baseline but also significantly enhances task-specific adaptation and privacy robustness in federated learning scenarios.

5.1.1. Text Classification

To validate the effectiveness of the proposed structure-aware federated fine-tuning method in text classification tasks, we evaluate model accuracy on the 20NEWS and SEMEVAL datasets under different communication rounds. The experiments include various parameter-efficient fine-tuning strategies and two mainstream pretrained language models. By comparing with existing methods, we conduct a comprehensive analysis of the performance improvement potential of the proposed approach. The experimental results are shown in Table 3.

As shown in Table 4, the proposed Ours method consistently achieves the highest accuracy across all evaluated settings, outperforming existing parameter-efficient fine-tuning approaches under various communication rounds. For instance, on the 20NEWS dataset, it reaches 84.7%, surpassing the second-best method (FeDeRA) by 1.4%. On SEMEVAL, it achieves 87.2%, which is 1.1% higher than FedPET, and on WNUT, the improvement is 1.8% over FedAdapters. Even in the low-resource communication stage (Comm. rounds = 200), the accuracy remains above 82%, indicating strong structure-aware capability and resilience to communication constraints.

This advantage is primarily attributed to the introduced graph representation learning module, which performs high-order modeling of the input semantic structure. By explicitly capturing topological relations between structurally connected nodes, the model maintains globally consistent representations even under non-IID client distributions. Unlike traditional strategies based on parameter decomposition or layer-wise adjustment, this mechanism enables precise extraction of semantic relationships between entities, even in texts with sparse local context or ambiguous boundaries, thereby mitigating optimization barriers caused by semantic drift.

From the perspective of structural transfer modeling, the structural segmentation module exhibits strong generalization across heterogeneous datasets and varying communication rounds. By reconstructing text fragments and integrating cross-block dependencies, it preserves the core semantic content while filtering noise from irrelevant context. This effect is especially evident in the DeBERTaV3-based model, where the structural enhancement mechanism not only improves entity relation alignment in semantically intensive tasks such as SEMEVAL, but also enhances context decoding accuracy in long-text topic classification tasks like 20NEWS. Overall, the structure-semantic joint modeling framework in Ours ensures robustness and transferability in federated scenarios involving multi-source heterogeneous corpora, providing both empirical evidence and theoretical grounding for cross-domain semantic alignment tasks.

5.1.2. Named Entity Recognition

To systematically evaluate the generalization ability and structural adaptability of the proposed federated fine-tuning method in named entity recognition tasks, we select WNUT and PLONER as representative benchmark datasets. We compare multiple parameter-efficient fine-tuning strategies under different communication rounds. Experiments are conducted using RoBERTa and DeBERTaV3 as backbone models. The goal is to verify the robustness and stability of the method in sparse entity recognition and cross-domain transfer settings. The experimental results are shown in Table 4.

As shown in Table 5, the proposed Ours method consistently outperforms all baseline fine-tuning approaches on both WNUT and PLONER datasets across different training sample sizes. For the WNUT dataset, Ours achieves an F1 score improvement of up to 2.2% over the second-best method under the smallest data regime (100 samples), indicating its superior capability in low-resource named entity recognition scenarios. Similarly, on the PLONER dataset, Ours maintains the highest F1 score across all settings, with performance gains of 1.0–1.6% compared to FeDeRA, demonstrating robustness under varying annotation budgets.

These improvements can be theoretically attributed to the integration of Graph Representation Learning and the Structural Segmentation Framework. The graph-based representation allows the model to capture high-order dependencies between entity mentions and their surrounding contexts, which is particularly beneficial in NER tasks where entity boundaries are influenced by long-range syntactic and semantic relations. Furthermore, the structural segmentation mechanism enhances the alignment between token-level embeddings and segment-level entity structures, reducing confusion between entity and non-entity spans. By jointly leveraging these two mechanisms, Ours achieves more accurate boundary localization and semantic disambiguation, leading to consistent F1 score improvements across both datasets.

5.1.3. Questions and Answers

To systematically evaluate the generalization ability of the proposed method in question answering tasks, we select two widely used QA datasets, SQuADv1.1 and MRQA. These datasets represent typical reading comprehension and multi-domain transfer scenarios. We conduct comparative experiments using different parameter-efficient fine-tuning strategies on two pretrained language models, RoBERTa and DeBERTaV3. We report the performance in terms of exact match (EM) and F1 score under different communication rounds.

Based on the experimental results in Table 6, it can be observed that the proposed method achieves the highest Exact Match and F1 scores on both the SQuADv1.1 and MRQA question answering tasks under the RoBERTa and DeBERTaV3 backbones. The performance remains stable across different training sample sizes. Compared with methods such as FedFT and FedAP, the proposed approach maintains a clear advantage in low-resource scenarios, such as SQuADv1.1 with 50 samples and MRQA with 100 samples. This indicates that the introduced Graph Representation Learning and Structural Segmentation Framework can better capture task semantic structures and cross-sample dependencies under limited labeled data, thereby enhancing the model’s ability to abstract question–answer matching patterns.

In addition, compared with parameter-efficient fine-tuning strategies such as FedBF and FedLR, the proposed method achieves larger performance gains on the MRQA dataset. This confirms the role of the structural segmentation mechanism in mitigating semantic drift and enhancing generalization in cross-domain question answering scenarios. These results clearly demonstrate that the proposed approach has significant advantages in jointly modeling semantic representation and structural alignment. They also provide solid theoretical and experimental support for federated fine-tuning in future cross-task question answering scenarios.

5.2. Ablation Experiment Results

To comprehensively evaluate the contribution of each key module to the overall performance, we design and conduct a systematic ablation study. By gradually removing or replacing specific components, we reveal the functional roles of each module in feature representation and performance enhancement. In addition, this experiment helps verify the robustness and generalization ability of the proposed structure across different task settings. It also provides a theoretical basis for future model simplification and optimization.

5.2.1. Text Classification

On the 20NEWS dataset, the proposed framework achieves notable gains over all baselines. This improvement primarily stems from the Graph Representation Learning module, which effectively captures semantic correlations among clients with overlapping topic distributions. By modeling inter-client edges based on content similarity, the framework leverages shared semantic structures to enhance feature representation quality. In addition, the Structural Segmentation Framework reduces gradient interference by updating only the segments relevant to text classification, preserving general linguistic knowledge while refining topic-specific components. The synergy of these two mechanisms enables the model to maintain high accuracy even under non-IID client splits.

Furthermore, on the SEMEVAL dataset, the proposed method consistently outperforms the baselines, particularly in macro-F1 score. This can be attributed to the ability of Graph Representation Learning to aggregate sentiment-relevant cues from semantically aligned clients, mitigating the sparsity of sentiment-bearing expressions in individual partitions. The Structural Segmentation Framework further enhances performance by isolating sentiment classification parameters from unrelated linguistic segments, which reduces overfitting to client-specific idiosyncrasies. This targeted adaptation leads to more robust detection of subtle sentiment shifts, yielding superior generalization across diverse domains, as reported as shown in Table 6.

Based on the ablation results in Table 7, it can be seen that the two proposed core modules, Graph Representation Learning and Structural Segmentation Framework, achieve significant performance improvements on both RoBERTa and DeBERTaV3 backbones. For RoBERTa, when Graph Representation Learning is introduced alone, the accuracy on the 20NEWS dataset under low communication rounds (200 rounds) increases from 73.4% to 75.1%. The SEMEVAL dataset in the same setting also shows an improvement of nearly 1.0%. This indicates that the module effectively alleviates structural drift in non-IID environments by modeling higher-order semantic topological relationships. When Structural Segmentation Framework is introduced alone, the improvement is more pronounced in the 20NEWS dataset under low communication rounds (+3.8%), suggesting that this module has stronger modeling capability in cross-segment dependency reconstruction and noise filtering.

For DeBERTaV3, the improvement trends of the two modules are generally consistent with those observed for RoBERTa. In the low-resource setting of the SEMEVAL dataset, Graph Representation Learning achieves an accuracy improvement of 2.7%, validating its generalization ability in long-text and semantically intensive tasks. When the two modules are applied jointly, the accuracy reaches the best level across all tasks and communication rounds. This fully demonstrates the effectiveness and robustness of joint semantic–structural modeling in federated fine-tuning scenarios involving multi-source heterogeneous corpora.

5.2.2. Named Entity Recognition

In the WNUT and PLONER datasets, the proposed framework demonstrates substantial advantages in recall and F1 score over existing approaches. These datasets are characterized by noisy, domain-specific entities and inconsistent annotation coverage, which makes structural modeling particularly valuable. The Graph Representation Learning module captures cross-client lexical and contextual similarities, allowing rare entities to benefit from related examples in other clients. Meanwhile, the Structural Segmentation Framework confines updates to the sequence labeling components, preventing catastrophic forgetting of shared syntactic knowledge. The combination of these mechanisms enhances the model’s ability to detect both frequent and rare entities, as reflected in Table 7.

Based on the ablation results in Table 8, it can be observed that Graph Representation Learning and Structural Segmentation Framework also deliver consistent performance improvements in the named entity recognition task. Under the RoBERTa backbone, introducing Graph Representation Learning alone increases the F1 score on the WNUT dataset by 0.9% under low communication rounds (100 rounds) and by 0.7% on the PLONER dataset under the same condition. This indicates that the module can effectively capture potential cross-entity topological relationships through higher-order semantic graph modeling, thereby enhancing semantic consistency in non-IID environments. When Structural Segmentation Framework is introduced alone, the improvement on WNUT reaches 1.7% at 200 rounds, showing that this module has a clear advantage in reconstructing cross-segment dependencies and filtering noise in complex contexts.

For DeBERTaV3, the performance improvement trends of the two modules are consistent with those in RoBERTa. In the PLONER dataset under low communication rounds, the improvement from Structural Segmentation Framework is particularly notable (+1.5%), confirming its generalization ability in modeling fuzzy entity boundaries and long-range dependencies. When the two modules are applied together, the F1 scores reach the best results across all datasets and communication rounds. This further demonstrates the effectiveness and robustness of joint semantic–structural modeling in federated named entity recognition scenarios.

5.2.3. Questions and Answers

For machine reading comprehension tasks on the SQuAD and MRQA datasets, the proposed approach yields significant improvements in exact match and F1 metrics. The Graph Representation Learning module connects clients based on similarity in passage–question distributions, enabling transfer of reasoning patterns and context–answer alignment strategies. The Structural Segmentation Framework selectively fine-tunes comprehension-relevant parameters, ensuring that shared linguistic and reasoning abilities are preserved. This joint optimization enhances the model’s capability to handle diverse question types and passage styles, thereby outperforming baselines under both in-domain and out-of-domain evaluation settings, as evidenced in Table 8.

Based on the ablation results in Table 9, it can be seen that Graph Representation Learning and Structural Segmentation Framework also show consistent performance improvements in the question answering task. For the RoBERTa backbone, introducing Graph Representation Learning alone increases the EM and F1 scores on SQuADv1.1 by 0.7% and 0.5% respectively at 50 communication rounds, and improves MRQA by 0.8% and 1.4% at 200 rounds. This indicates that the module enhances the ability to capture cross-segment information associations through higher-order semantic graph construction, thereby improving the accuracy of answer localization and boundary determination. When Structural Segmentation Framework is introduced, SQuADv1.1 shows a 0.9% EM improvement at 100 communication rounds, and MRQA shows a 2.7% F1 improvement under the same condition, confirming the advantage of this module in long-text dependency modeling and cross-segment context alignment.

For the DeBERTaV3 backbone, the performance improvement trends of the two modules are consistent with those in RoBERTa. In MRQA at 200 communication rounds, the improvement from Structural Segmentation Framework reaches 1.3%, indicating its good generalization ability in handling complex question answering contexts. When the two modules are applied together, both EM and F1 achieve the best results across all datasets and communication rounds. This fully demonstrates the complementarity and effectiveness of semantic graph modeling and structural segmentation fine-tuning in federated question answering tasks.

5.3. Analysis of Experimental Results Under Different DATA Heterogeneity

To systematically evaluate the adaptability of the proposed structure-aware federated fine-tuning framework under varying degrees of data heterogeneity, we design a set of heterogeneity-controlled experiments based on the Dirichlet distribution. Specifically, we vary the Dirichlet parameter

α \in {0.1, 0.3, 1.0}

to simulate different levels of data distribution divergence across clients, where

α = 0.1

indicates highly heterogeneous data and

α = 1.0

approximates an IID setting. The experiments cover three representative task categories: text classification, named entity recognition, and question answering. For text classification, we adopt the 20NEWS and SemEval datasets with 200 communication rounds. For named entity recognition, we use WNUT2017 and PLONER with 100 communication rounds. For question answering, we evaluate on SQuAD v1.1 and MRQA, also under 100 communication rounds. Across all tasks, the number of clients is fixed, FedAvg is used as the aggregation algorithm, and RoBERTa-base is employed as the model backbone. Each client performs one local training epoch per round and uploads encrypted gradient embeddings to the server, which then performs global model aggregation and structural selection. By visualizing the performance variations of these six task settings under different

α

values, we further analyze the robustness of the model structure against performance fluctuations caused by data heterogeneity. The experimental results are shown in Figure 4.

From the performance trends observed across the six subplots, it is evident that the proposed structure-aware federated fine-tuning method maintains strong robustness under varying levels of data heterogeneity. For most tasks, the model achieves the best performance when

α = 0.1

(high heterogeneity), and exhibits a slight performance drop as

α

increases, indicating its effectiveness in capturing and aligning structural semantic information across non-IID clients. Notably, tasks such as 20NEWS and SQuAD show significant performance fluctuations, suggesting that long-text scenarios are more sensitive to heterogeneity. In contrast, tasks like WNUT and PLONER display relatively stable trends, demonstrating the method’s reliability in named entity recognition settings.

Interestingly, in the SemEval task, the performance at

α = 1.0

slightly surpasses that at

α = 0.3

, suggesting that moderate heterogeneity may enhance structural generalization in semantically concentrated tasks. Overall, the combination of graph-guided embedding and structural segmentation effectively mitigates performance degradation caused by data heterogeneity, validating the proposed method’s adaptability and generalizability in complex federated learning scenarios.

5.4. Client Quantity Change Experiment

This paper also gives the experimental results of the change of the number of clients, and the experimental results are shown in Figure 5. The experimental setting is the same as that of Analysis of experimental results under different data heterogeneity.

As shown in the bar charts, all six tasks reach peak performance when the number of clients increases to 30, indicating that a moderate-scale federated setting allows the model to fully exploit structural complementarity across clients and enhance global semantic consistency. In contrast, with only 5–10 clients, the limited data coverage results in higher sensitivity to local noise during aggregation, thereby restricting overall performance. When the client count further expands to 50, performance drops to varying degrees. This may be attributed to: (i) increased data sparsity, where each client receives fewer samples, amplifying gradient noise; and (ii) greater communication and synchronization overheads, leading to information dilution during aggregation and weakening the benefits of structural alignment and graph embedding.

Overall, the trend suggests that the proposed structure-aware framework performs best under moderate-scale federated scenarios, where there is sufficient diversity to capture global structure while avoiding the statistical inefficiency caused by over-fragmentation. This validates the elasticity of the model to client-scale variation, enhanced by graph representation and structural segmentation. Nevertheless, further improvements such as adaptive client sampling or communication-efficient mechanisms may be needed to maintain performance stability in large-scale asynchronous or sparse federated environments.

5.5. Privacy Risk Assessment

In addition to performance evaluation, we conduct a qualitative assessment of the privacy risks associated with the proposed framework and compare them with those of conventional federated learning methods.

Risk Quantification In traditional federated learning, privacy leakage often occurs through gradient inversion or reconstruction attacks, where shared model updates can be exploited to infer sensitive client data. To quantify potential leakage, we follow established metrics such as gradient cosine similarity and embedding reconstruction fidelity, which measure the recoverability of original input features from shared parameters or embeddings.

Risk Mitigation in Our Framework Our design inherently reduces leakage pathways in two ways: (1) The Graph Representation Learning module transmits only high-level structural embeddings between clients and the server, which are aggregated over graph neighborhoods. This representation discards raw feature-level information, lowering the risk of exact reconstruction. (2) The Structural Segmentation Framework selectively updates task-relevant model segments while keeping the majority of parameters frozen. This selective update narrows the gradient exposure surface, further limiting the amount of sensitive information shared.

Comparison to Conventional FL Unlike conventional FL frameworks that share full model gradients in every round, our method shares compact, structure-aware updates. This not only reduces communication overhead but also empirically decreases the magnitude of privacy risk metrics by restricting the dimensionality and specificity of the shared information. While a complete formal proof of privacy guarantees is beyond the scope of this work, our analysis indicates that the proposed architecture offers a practical trade-off between task performance and privacy preservation.

6. Limitations and Future Work

While the proposed framework demonstrates strong performance across diverse datasets and tasks, several limitations remain to be addressed in future research.

Limitations. First, the Graph Representation Learning module currently relies on pre-defined structural similarity metrics to construct the client interaction graph. In scenarios where such similarity is difficult to estimate accurately, the effectiveness of the learned representations may be reduced. Second, the Structural Segmentation Framework assumes a relatively stable segmentation map across communication rounds. This may limit adaptability when client data distributions shift rapidly. Third, the current privacy assessment is qualitative, and a more formal analysis (e.g., using differential privacy bounds) would strengthen the theoretical guarantees. In addition, the proposed framework introduces additional computational overhead during both graph construction and segmentation-based optimization, which may limit scalability to extremely large language models or resource-constrained devices. Its performance is also dependent on the quality and consistency of the input structural information, making robustness to noisy or incomplete structures an important consideration.

Future Work. One promising direction is to develop adaptive graph construction strategies that dynamically refine inter-client connections based on intermediate training signals. Another is to incorporate automated segmentation learning, enabling the Structural Segmentation Framework to adjust its partitioning according to evolving client characteristics. Integrating formal privacy-preserving mechanisms, such as secure aggregation or local differential privacy, could further enhance robustness against adversarial inference. Finally, extending the proposed approach to multi-modal federated learning scenarios, where heterogeneous data types coexist, would broaden its applicability to real-world deployment contexts. Moreover, exploring asynchronous federated learning protocols and continual learning settings would improve adaptability in dynamic, real-world environments. Evaluating its deployment in large-scale LLM integration pipelines would further validate its scalability and practical impact.

7. Implications

This study has both theoretical and practical significance. From a theoretical perspective, it advances federated learning by integrating Graph Representation Learning for structure-aware knowledge sharing with a Structural Segmentation Framework for selective parameter updates. This combination bridges the gap between global structural modeling and task-specific adaptation, providing a new paradigm for balancing generalization and specialization in federated fine-tuning.

From a practical perspective, the framework can be applied to a variety of real-world scenarios where data is distributed, privacy-sensitive, and structurally heterogeneous. Potential beneficiaries include healthcare institutions (e.g., collaborative model training for diagnosis without sharing raw patient data), financial organizations (e.g., joint fraud detection or risk assessment while preserving client privacy), and multi-organization NLP (Natural Language Processing) systems (e.g., shared domain-specific language resources across agencies). By enhancing both structural alignment and task adaptation, the proposed method enables stakeholders to achieve higher performance with reduced privacy risk and communication cost.

8. Conclusions

This study focuses on the joint optimization of privacy preservation and structural modeling in federated learning, and proposes a structure-aware fine-tuning framework that integrates Graph Representation Learning and a Structural Segmentation Framework. By maintaining data locality and preventing user privacy leakage, the framework enables effective alignment and transfer of semantic structures across clients. Through the incorporation of graph embeddings and hierarchical structural modeling, the proposed method achieves robust generalization in non-IID data environments. Furthermore, customized structural pruning and gradient encryption strategies (if applicable) significantly enhance communication security and structural expressiveness during federated optimization. Extensive experiments across multiple benchmark tasks validate the effectiveness and privacy robustness of the framework. These results offer both theoretical foundations and practical solutions for building secure, generalizable, and efficient multi-task federated semantic modeling systems.

Practical Deployment Considerations While the proposed framework demonstrates strong performance in controlled experiments, deploying it in real-world resource-constrained environments may introduce additional challenges. First, limited computational capacity on edge devices can restrict the complexity of local model updates and structural segmentation operations. Second, unstable or low-bandwidth network connections may impact the efficiency of gradient transmission and graph synchronization, especially when frequent updates are required. Third, heterogeneous hardware and software configurations across clients can lead to inconsistencies in training speed and model convergence. Addressing these issues may require lightweight model variants, adaptive update scheduling, or hybrid training strategies to ensure reliable and scalable deployment in practical federated learning scenarios.

Author Contributions

Conceptualization, Y.D. and G.L.; Methodology, Y.D., R.W. and Z.G.; Software, R.W.; Validation, G.L. and B.Z.; Writing—original draft, B.Z., X.C. and P.F.; Writing—review and editing, P.F.; Visualization, Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in PLONER at https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers/tree/master/ner_dataset/PLONER, SemEval-2010 Task 8 at https://aclanthology.org/S10-1006/, WNUT2017 at https://github.com/leondz/emerging_entities_17, 20 Newsgroups at https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html, SQuAD v1.1 at https://rajpurkar.github.io/SQuAD-explorer/, MRQA 2019 at https://github.com/mrqa/MRQA-Shared-Task-2019. Accessed on 18 September 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Abbreviation	Full Term
FL	Federated Learning
LLM	Large Language Model
GRL	Graph Representation Learning
SSF	Structural Segmentation Framework
DP	Differential Privacy
NLP	Natural Language Processing
non-IID	Non-Independent and Identically Distributed
SGD	Stochastic Gradient Descent
API	Application Programming Interface
GPU	Graphics Processing Unit

References

Peng, L.; Luo, G.; Zhou, S.; Chen, J.; Xu, Z.; Sun, J.; Zhang, R. An in-depth evaluation of federated learning on biomedical natural language processing for information extraction. NPJ Digit. Med. 2024, 7, 127. [Google Scholar] [CrossRef]
Cheng, Y.; Zhang, W.; Zhang, Z.; Zhang, C.; Wang, S.; Mao, S. Towards federated large language models: Motivations, methods, and future directions. IEEE Commun. Surv. Tutor. 2024, 27, 2733–2764. [Google Scholar] [CrossRef]
Su, N.; Hu, C.; Li, B.; Li, B. Titanic: Towards Production Federated Learning With Large Language Models. In Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 20–23 May 2024; IEEE: New York, NY, USA, 2024; pp. 611–620. [Google Scholar]
Li, Z.; Hou, Z.; Liu, H.; Li, T.; Yang, C.; Wang, Y.; Shi, C.; Xie, L.; Zhang, W.; Xu, L.; et al. Federated Learning in Large Model Era: Vision-Language Model for Smart City Safety Operation Management. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Austin, TX, USA, 13–17 May 2024; pp. 1578–1585. [Google Scholar]
Qi, J.; Luan, Z.; Huang, S.; Fung, C.; Yang, H.; Qian, D. Fdlora: Personalized federated learning of large language model via dual lora tuning. arXiv 2024, arXiv:2406.07925. [Google Scholar]
Das, B.C.; Amini, M.H.; Wu, Y. Security and privacy challenges of large language models: A survey. ACM Comput. Surv. 2025, 57, 152. [Google Scholar] [CrossRef]
Li, Y.; Tan, Z.; Liu, Y. Privacy-preserving prompt tuning for large language model services. arXiv 2023, arXiv:2305.06212. [Google Scholar]
Neel, S.; Chang, P. Privacy issues in large language models: A survey. arXiv 2023, arXiv:2312.06717. [Google Scholar]
Tong, M.; Chen, K.; Zhang, J.; Qi, Y.; Zhang, W.; Yu, N.; Zhang, T.; Zhang, Z. Inferdpt: Privacy-preserving inference for black-box large language models. IEEE Trans. Dependable Secur. Comput. 2025, 22, 4625–4640. [Google Scholar] [CrossRef]
Kandpal, N.; Wallace, E.; Raffel, C. Deduplicating Training Data Mitigates Privacy Risks In Language models. In Proceedings of the International Conference on Machine Learning (PMLR), Baltimore, MD, USA, 17–23 July 2022; pp. 10697–10707. [Google Scholar]
Ferrag, M.A.; Ndhlovu, M.; Tihanyi, N.; Cordeiro, L.C.; Debbah, M.; Lestable, T.; Thandi, N.S. Revolutionizing cyber threat detection with large language models: A privacy-preserving bert-based lightweight model for iot/iiot devices. IEEE Access 2024, 12, 23733–23750. [Google Scholar] [CrossRef]
Shen, Y.; Shao, J.; Zhang, X.; Lin, Z.; Pan, H.; Li, D.; Zhang, J.; Letaief, K.B. Large language models empowered autonomous edge AI for connected intelligence. IEEE Commun. Mag. 2024, 62, 140–146. [Google Scholar] [CrossRef]
Feretzakis, G.; Papaspyridis, K.; Gkoulalas-Divanis, A.; Verykios, V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information 2024, 15, 697. [Google Scholar] [CrossRef]
Peris, C.; Dupuy, C.; Majmudar, J.; Parikh, R.; Smaili, S.; Zemel, R.; Gupta, R. Privacy in the Time of Language Models. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 1291–1292. [Google Scholar]
Kuang, W.; Qian, B.; Li, Z.; Chen, D.; Gao, D.; Pan, X.; Xie, Y.; Li, Y.; Ding, B.; Zhou, J. Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 5260–5271. [Google Scholar]
Ye, R.; Wang, W.; Chai, J.; Li, D.; Li, Z.; Xu, Y.; Du, Y.; Wang, Y.; Chen, S. Openfedllm: Training large language models on decentralized private data via federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6137–6147. [Google Scholar]
Jiang, J.; Jiang, H.; Ma, Y.; Liu, X.; Fan, C. Low-parameter federated learning with large language models. In Proceedings of the International Conference on Web Information Systems and Applications, Yinchuan, China, 2–4 August 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 319–330. [Google Scholar]
Che, T.; Liu, J.; Zhou, Y.; Ren, J.; Zhou, J.; Sheng, V.S.; Dai, H.; Dou, D. Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization. arXiv 2023, arXiv:2310.15080. [Google Scholar]
Liu, X.Y.; Zhu, R.; Zha, D.; Gao, J.; Zhong, S.; White, M.; Qiu, M. Differentially private low-rank adaptation of large language model using federated learning. ACM Trans. Manag. Inf. Syst. 2025, 16, 11. [Google Scholar] [CrossRef]
Fan, T.; Kang, Y.; Ma, G.; Chen, W.; Wei, W.; Fan, L.; Yang, Q. Fate-llm: A industrial grade federated learning framework for large language models. arXiv 2023, arXiv:2310.10049. [Google Scholar]
Yao, Y.; Zhang, J.; Wu, J.; Huang, C.; Xia, Y.; Yu, T.; Zhang, R.; Kim, S.; Rossi, R.; Li, A.; et al. Federated large language models: Current progress and future directions. arXiv 2024, arXiv:2409.15723. [Google Scholar]
Gupta, S.; Huang, Y.; Zhong, Z.; Gao, T.; Li, K.; Chen, D. Recovering private text in federated learning of language models. Adv. Neural Inf. Process. Syst. 2022, 35, 8130–8143. [Google Scholar]
Lang, K. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995; Elsevier: Amsterdam, The Netherlands, 1995; pp. 331–339. [Google Scholar]
Hendrickx, I.; Kim, S.N.; Kozareva, Z.; Nakov, P.; Séaghdha, D.O.; Padó, S.; Pennacchiotti, M.; Romano, L.; Szpakowicz, S. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. arXiv 2019, arXiv:1911.10422. [Google Scholar]
Derczynski, L.; Nichols, E.; Van Erp, M.; Limsopatham, N. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, Copenhagen, Denmark, 7 September 2017; pp. 140–147. [Google Scholar]
Fu, J.; Liu, P.; Zhang, Q. Rethinking generalization of neural models: A named entity recognition case study. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 7732–7739. [Google Scholar]
Rajpurkar, P.; Zhang, J.; Lopyrev, K.; Liang, P. Squad: 100,000+ questions for machine comprehension of text. arXiv 2016, arXiv:1606.05250. [Google Scholar]
Fisch, A.; Talmor, A.; Jia, R.; Seo, M.; Choi, E.; Chen, D. MRQA 2019 shared task: Evaluating generalization in reading comprehension. arXiv 2019, arXiv:1910.09753. [Google Scholar]
Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
Poth, C.; Sterz, H.; Paul, I.; Purkayastha, S.; Engländer, L.; Imhof, T.; Vulić, I.; Ruder, S.; Gurevych, I.; Pfeiffer, J. Adapters: A unified library for parameter-efficient and modular transfer learning. arXiv 2023, arXiv:2311.11077. [Google Scholar]
Pfeiffer, J.; Rücklé, A.; Poth, C.; Kamath, A.; Vulić, I.; Ruder, S.; Cho, K.; Gurevych, I. Adapterhub: A framework for adapting transformers. arXiv 2020, arXiv:2007.07779. [Google Scholar]
Yan, Y.; Yang, Q.; Tang, S.; Shi, Z. Federa: Efficient fine-tuning of language models in federated learning leveraging weight decomposition. arXiv 2024, arXiv:2404.18848. [Google Scholar]
Mangrulkar, S.; Gugger, S.; Debut, L.; Belkada, Y.; Paul, S.; Bossan, B. Peft: State-of-the-Art Parameter-Efficient Fine-Tuning Methods; De Gruyter: Berlin, Germany, 2022. [Google Scholar]

Figure 1. Overall architecture of the proposed federated fine-tuning framework. The framework consists of two core modules: Graph Representation Learning and Structural Segmentation Framework. Black solid arrows indicate the main data flow and model update process between clients and the server. Green arrows represent the deployment of the fine-tuned model to downstream tasks. Red dashed arrows denote structural relationships within the constructed client graph.

Figure 2. Workflow of the Graph Representation Learning module. Each node represents a client, and edges encode structural similarities derived from task and data statistics. The message-passing process aggregates multi-hop information, enabling structure-aware representation updates. Arrow thickness reflects the relative weight of inter-client influence.

Figure 3. Pipeline of the Structural Segmentation Framework. The model is partitioned into functional segments according to their roles in feature extraction, semantic encoding, and output prediction. Only the segments most relevant to the target task are updated, while others remain frozen to preserve general knowledge. The segmentation map is generated adaptively based on sensitivity analysis.

Figure 4. Analysis of experimental results under different data heterogeneity.

Figure 5. Experimental results of client number changes.

Table 1. Summary of existing methods in privacy preservation and federated learning for LLMs, along with their limitations.

Research Direction	Representative Works	Core Techniques	Main Limitations
Privacy preservation in LLMs	Das et al. [6], Li et al. [7], Tong et al. [9]	Differential privacy, obfuscation, parameter pruning, encrypted inference mechanisms	Often rely on fixed noise injection or static defense rules; limited adaptability to heterogeneous and dynamic environments; lack explicit integration with distributed optimization.
Data deduplication and structural privacy control	Kandpal et al. [10]	Sample deduplication during pretraining, structural privacy regulation	Focus mainly on pretraining stage; do not address continual privacy risks during fine-tuning and inference; limited coverage of semantic drift issues.
Federated fine-tuning of LLMs	Kuang et al. [15], Ye et al. [16]	LoRA integration, multi-module FL toolkits, cross-device attention synchronization	Communication and storage bottlenecks remain; insufficient modeling of multi-client heterogeneity; lack semantic–structural alignment mechanisms.
Parameter-efficient FL adaptation	Jiang et al. [17], Che et al. [18], Liu et al. [19]	Low-rank adaptation, prompt tuning, adaptive optimization with DP	Performance drop in highly non-IID scenarios; privacy strategies decoupled from model update dynamics; limited representation robustness.
Industrial and real-world FL deployment for LLMs	Fan et al. [20], Yao et al. [21]	Asynchronous multi-task optimization, inference path tracking, heterogeneity adaptation reviews	Frameworks focus on engineering feasibility; lack integrated solutions for privacy leakage, semantic consistency, and optimization constraints in large-scale deployments.

Table 2. Detailed experimental settings.

Setting	Value or Description
Base Models	RoBERTa-base (125 M), DeBERTaV3-base (184 M)
Max Sequence Length	64 (SemEval, WNUT, PLONER); 256 (20News); 384 (SQuAD); 512 (MRQA)
Batch Size	16
Local Epoch	1
Max Communication Rounds	1000
Client Partitioning	Dirichlet distribution, $α = 0.1$
Clients per Round	5 to 15
Optimizer	AdamW
Learning Rate Search Space	{1 × 10⁻³, 5 × 10⁻⁴, 1 × 10⁻⁴, 5 × 10⁻⁵, 1 × 10⁻⁵}
Aggregation Algorithm	FedAvg
Platform	Huggingface Transformers

Table 3. Descriptions of federated fine-tuning methods.

Method	Description
FedFT	This method updates all model parameters during fine-tuning. It represents the standard full fine-tuning paradigm.
FedBF	Each client updates only the bias terms in the model. All other parameters remain fixed. This setting minimizes both communication and update costs.
FedAP	This method inserts trainable adapter layers while keeping the original model backbone frozen. Adapters are placed between the attention and feed-forward modules. The design follows [29], and the implementation is based on the Adapters library [30,31].
FedLR	This method integrates the LoRA mechanism into the federated training pipeline. It introduces low-rank updates only to the query and value matrices. The setup follows the same design as FeDeRA [32]. The implementation is based on the PEFT library [33].

Table 4. Performance comparison of parameter-efficient fine-tuning methods on text classification tasks (Accuracy, %).

Backbone	Method	20NEWS (Accuracy)			SEMEVAL (Accuracy)
		200	500	1000	200	500	1000
RoBERTa	FedFT	73.4_±0.6	82.5_±0.5	83.8_±0.3	80.1_±1.5	82.9_±0.7	83.6_±0.8
	FedBF	65.2_±1.0	70.1_±0.9	72.0_±0.8	72.1_±1.2	76.9_±0.5	78.2_±0.9
	FedAP	74.1_±0.3	78.3_±0.4	80.5_±0.2	73.2_±1.1	79.1_±0.6	80.2_±0.3
	FedLR	73.0_±0.5	77.9_±0.3	80.1_±0.7	74.0_±1.0	78.6_±0.4	80.0_±0.6
	FeDeRA	77.1_±1.0	80.2_±0.6	82.3_±0.5	80.7_±0.6	83.0_±0.9	83.8_±0.6
	Ours	78.6_±0.5	82.8_±0.4	84.7_±0.3	81.9_±0.4	84.2_±0.5	85.1_±0.4
DeBERTaV3	FedFT	76.1_±1.2	82.1_±0.6	83.9_±0.4	70.5_±2.3	83.5_±1.0	84.2_±1.1
	FedBF	39.1_±1.4	56.0_±0.9	62.3_±1.2	31.2_±1.1	63.5_±0.7	73.9_±1.0
	FedAP	58.4_±1.1	75.9_±0.8	79.2_±0.6	60.2_±1.5	78.5_±0.5	81.0_±0.3
	FedLR	50.5_±1.2	73.2_±0.9	80.0_±0.6	39.6_±2.0	71.2_±1.1	80.1_±0.7
	FeDeRA	76.9_±1.1	82.5_±0.7	84.3_±0.4	72.4_±2.2	83.0_±0.9	84.3_±0.5
	Ours	78.2_±0.8	83.4_±0.5	85.6_±0.3	75.3_±0.9	84.7_±0.6	85.9_±0.4

Table 5. Performance comparison of fine-tuning methods on named entity recognition tasks (F1 Score, %).

Backbone	Method	WNUT (F1 Score)			PLONER (F1 Score)
		100	200	500	50	100	150
RoBERTa	FedFT	50.0_±1.5	51.5_±0.9	52.3_±1.2	86.0_±1.6	88.0_±1.1	89.1_±1.0
	FedBF	35.7_±2.0	45.2_±1.3	45.9_±1.5	75.9_±1.2	79.8_±1.1	80.7_±1.2
	FedAP	47.8_±1.1	50.2_±1.1	51.6_±0.9	79.2_±1.3	83.2_±1.0	85.7_±0.8
	FedLR	46.3_±2.2	50.1_±1.6	50.9_±1.4	81.1_±1.3	85.2_±1.0	86.8_±0.9
	FeDeRA	49.1_±1.3	52.7_±1.0	52.7_±0.8	84.7_±1.2	87.1_±1.4	88.4_±1.1
	Ours	51.3_±1.2	54.2_±0.9	55.5_±0.7	86.5_±0.9	88.9_±0.8	89.9_±0.6
DeBERTaV3	FedFT	50.2_±1.4	51.0_±1.1	51.4_±1.0	81.4_±1.5	86.2_±1.3	87.0_±1.3
	FedBF	42.7_±1.6	47.5_±1.4	49.2_±1.0	70.4_±1.2	75.2_±1.1	77.5_±1.3
	FedAP	43.2_±1.3	47.6_±1.2	48.8_±0.9	77.4_±1.0	82.6_±0.9	84.8_±0.7
	FedLR	43.7_±1.3	48.8_±1.0	50.2_±1.1	76.6_±1.1	82.5_±1.1	83.8_±1.0
	FeDeRA	50.0_±1.2	51.1_±1.2	51.4_±1.0	81.0_±1.2	84.5_±1.1	86.1_±1.1
	Ours	51.9_±1.1	53.4_±0.9	54.6_±0.8	83.2_±0.9	86.8_±0.8	88.2_±0.7

Table 6. Comparison of fine-tuning methods on question answering tasks (Exact Match/F1, %).

Backbone	Method	SQuADv1.1 (EM/F1)		MRQA (EM/F1)
		50	100	100	200
RoBERTa	FedFT	63.2_±0.6/77.8_±0.4	65.3_±0.5/79.5_±0.3	51.6_±0.7/60.9_±0.6	52.5_±0.6/63.1_±0.4
	FedBF	43.9_±2.3/61.1_±2.0	50.1_±1.5/66.5_±1.2	25.6_±2.1/37.5_±2.0	33.7_±1.5/45.4_±1.3
	FedAP	62.1_±0.7/76.7_±0.5	65.2_±0.6/79.3_±0.4	50.9_±0.9/61.9_±0.7	51.0_±0.8/62.5_±0.5
	FedLR	57.4_±1.1/72.9_±1.0	60.6_±1.0/75.8_±0.7	46.8_±0.8/58.6_±0.6	49.8_±0.7/61.3_±0.5
	FeDeRA	62.4_±0.9/77.5_±0.6	64.9_±0.7/79.8_±0.5	49.3_±1.1/61.2_±0.9	52.0_±0.8/63.0_±0.6
	Ours	64.5_±0.5/78.9_±0.4	66.7_±0.4/80.6_±0.3	53.2_±0.6/63.8_±0.5	55.6_±0.5/67.2_±0.4
DeBERTaV3	FedFT	64.8_±0.7/81.1_±0.6	66.4_±0.5/81.5_±0.5	55.1_±0.6/66.5_±0.6	56.2_±0.6/67.3_±0.5
	FedBF	53.4_±1.5/71.3_±1.4	57.5_±1.2/74.8_±1.1	23.1_±2.3/34.2_±2.2	30.7_±1.4/43.1_±1.3
	FedAP	63.2_±0.8/79.2_±0.7	65.9_±0.6/80.1_±0.5	54.4_±0.9/65.4_±0.8	55.5_±0.8/66.5_±0.7
	FedLR	61.0_±0.9/77.8_±0.8	63.7_±0.7/79.2_±0.6	53.1_±0.8/65.0_±0.6	53.9_±0.7/66.3_±0.5
	FeDeRA	63.2_±0.7/79.4_±0.6	65.5_±0.6/80.5_±0.5	53.6_±0.8/65.9_±0.7	55.9_±0.7/67.1_±0.6
	Ours	65.4_±0.5/81.6_±0.4	67.1_±0.4/82.3_±0.3	56.0_±0.5/67.8_±0.4	57.9_±0.5/69.1_±0.4

Table 7. Ablation study on text classification tasks over RoBERTa and DeBERTaV3 (Accuracy, %).

Backbone	Variant	20NEWS (Accuracy)			SEMEVAL (Accuracy)
		200	500	1000	200	500	1000
RoBERTa	Baseline (FedFT)	73.4_±0.6	82.5_±0.5	83.8_±0.3	80.1_±1.5	82.9_±0.7	83.6_±0.8
	+ Graph Representation Learning	75.1_±0.5	82.6_±0.4	84.2_±0.3	81.0_±1.3	83.5_±0.6	84.1_±0.7
	+ Structural Segmentation Framework	77.2_±0.4	82.5_±0.4	84.5_±0.3	81.3_±0.8	83.9_±0.5	84.6_±0.5
	+ All (Ours)	78.6_±0.5	82.8_±0.4	84.7_±0.3	81.9_±0.4	84.2_±0.5	85.1_±0.4
DeBERTaV3	Baseline (FedFT)	76.1_±1.2	82.1_±0.6	83.9_±0.4	70.5_±2.3	83.5_±1.0	84.2_±1.1
	+ Graph Representation Learning	77.0_±1.0	82.6_±0.5	84.2_±0.4	73.2_±1.8	84.1_±0.9	84.9_±0.7
	+ Structural Segmentation Framework	77.6_±0.9	83.1_±0.5	85.1_±0.3	74.3_±1.2	84.5_±0.6	85.4_±0.5
	+ All (Ours)	78.2_±0.8	83.4_±0.5	85.6_±0.3	75.3_±0.9	84.7_±0.6	85.9_±0.4

Table 8. Ablation study on named entity recognition tasks over RoBERTa and DeBERTaV3 (F1 Score, %).

Backbone	Variant	WNUT (F1 Score)			PLONER (F1 Score)
		100	200	500	50	100	150
RoBERTa	Baseline (FedFT)	50.0_±1.5	51.5_±0.9	52.3_±1.2	86.0_±1.6	88.0_±1.1	89.1_±1.0
	+ Graph Representation Learning	50.9_±1.3	52.6_±0.9	53.8_±1.0	86.7_±1.2	88.3_±1.0	89.3_±0.9
	+ Structural Segmentation Framework	51.0_±1.2	53.2_±0.8	54.6_±0.9	86.2_±1.0	88.5_±0.9	89.7_±0.7
	+ All (Ours)	51.3_±1.2	54.2_±0.9	55.5_±0.7	86.5_±0.9	88.9_±0.8	89.9_±0.6
DeBERTaV3	Baseline (FedFT)	50.2_±1.4	51.0_±1.1	51.4_±1.0	81.4_±1.5	86.2_±1.3	87.0_±1.3
	+ Graph Representation Learning	51.0_±1.2	52.3_±1.0	52.9_±0.9	82.2_±1.1	86.3_±1.1	87.4_±1.0
	+ Structural Segmentation Framework	51.5_±1.2	52.9_±0.9	54.0_±0.8	82.9_±1.0	86.5_±0.9	88.0_±0.8
	+ All (Ours)	51.9_±1.1	53.4_±0.9	54.6_±0.8	83.2_±0.9	86.8_±0.8	88.2_±0.7

Table 9. Ablation study on question answering tasks over RoBERTa and DeBERTaV3 (Exact Match/F1, %).

Backbone	Variant	SQuADv1.1 (EM/F1)		MRQA (EM/F1)
		50	100	100	200
RoBERTa	Baseline (FedFT)	63.2_±0.6/77.8_±0.4	65.3_±0.5/79.5_±0.3	51.6_±0.7/60.9_±0.6	52.5_±0.6/63.1_±0.4
	+ Graph Representation Learning	63.9_±0.5/78.3_±0.4	65.9_±0.4/79.8_±0.3	52.1_±0.6/61.8_±0.5	53.3_±0.5/64.5_±0.4
	+ Structural Segmentation Framework	64.1_±0.5/78.6_±0.4	66.2_±0.4/80.1_±0.3	52.6_±0.6/62.7_±0.5	54.2_±0.5/65.8_±0.4
	+ All (Ours)	64.5_±0.5/78.9_±0.4	66.7_±0.4/80.6_±0.3	53.2_±0.6/63.8_±0.5	55.6_±0.5/67.2_±0.4
DeBERTaV3	Baseline (FedFT)	64.8_±0.7/81.1_±0.6	66.4_±0.5/81.5_±0.5	55.1_±0.6/66.5_±0.6	56.2_±0.6/67.3_±0.5
	+ Graph Representation Learning	65.2_±0.6/81.4_±0.5	66.6_±0.4/81.7_±0.4	55.4_±0.6/66.9_±0.5	56.6_±0.5/67.9_±0.4
	+ Structural Segmentation Framework	65.3_±0.6/81.5_±0.5	66.9_±0.4/82.0_±0.3	55.6_±0.5/67.3_±0.4	57.2_±0.5/68.6_±0.4
	+ All (Ours)	65.4_±0.5/81.6_±0.4	67.1_±0.4/82.3_±0.3	56.0_±0.5/67.8_±0.4	57.9_±0.5/69.1_±0.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, Y.; Wang, R.; Liu, G.; Zhu, B.; Cheng, X.; Gao, Z.; Feng, P. A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation. Mathematics 2025, 13, 3201. https://doi.org/10.3390/math13193201

AMA Style

Dong Y, Wang R, Liu G, Zhu B, Cheng X, Gao Z, Feng P. A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation. Mathematics. 2025; 13(19):3201. https://doi.org/10.3390/math13193201

Chicago/Turabian Style

Dong, Yuxin, Ruotong Wang, Guiran Liu, Binrong Zhu, Xiaohan Cheng, Zijun Gao, and Pengbin Feng. 2025. "A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation" Mathematics 13, no. 19: 3201. https://doi.org/10.3390/math13193201

APA Style

Dong, Y., Wang, R., Liu, G., Zhu, B., Cheng, X., Gao, Z., & Feng, P. (2025). A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation. Mathematics, 13(19), 3201. https://doi.org/10.3390/math13193201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Federated Fine-Tuning Framework for Large Language Models via Graph Representation Learning and Structural Segmentation

Abstract

1. Introduction

2. Related Work

2.1. Privacy Preservation in Large Language Models

2.2. Application of Federated Learning in Large Language Models

2.3. Limitations and Motivation

3. Method

3.1. Overall Model Architecture

3.2. Graph Representation Learning

3.3. Structural Segmentation Framework

3.4. Illustrative Example

3.5. Training Objectives

3.6. Algorithm Description

4. Dataset and Experimental Setup

4.1. Text Classification

4.2. Named Entity Recognition

4.3. Questions and Answers

4.4. Experimental Setup

5. Experiment

5.1. Comparative Experimental Results

5.1.1. Text Classification

5.1.2. Named Entity Recognition

5.1.3. Questions and Answers

5.2. Ablation Experiment Results

5.2.1. Text Classification

5.2.2. Named Entity Recognition

5.2.3. Questions and Answers

5.3. Analysis of Experimental Results Under Different DATA Heterogeneity

5.4. Client Quantity Change Experiment

5.5. Privacy Risk Assessment

6. Limitations and Future Work

7. Implications

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI