1. Introduction
The big data era has transformed machine learning, but traditional centralized approaches require participants to upload raw data to a central server, often compromising privacy. In fields such as finance and healthcare, privacy or legal constraints prevent data sharing, making it difficult to collect large, high-quality datasets for training. This deepens the challenges posed by data silos and privacy protection [
1]. In 2016, McMahan et al. introduced federated learning (FL) [
2], enabling distributed clients to collaboratively train a global model without sharing their local data. In each round, the server sends the global model to selected clients, who train locally and upload their updates for aggregation [
3]. FL has become a key paradigm in edge computing, leveraging distributed resources while preserving data privacy. Despite avoiding raw data exchange, FL faces significant challenges, notably communication overhead [
4] and privacy risks [
5]. Large data transmissions, frequent rounds, and varying bandwidth make communication costs far exceed those of local training. Solutions like client subsampling [
5], local updates [
6], and model compression [
7] aim to reduce this bottleneck. Privacy concerns persist on both the client and server sides, as local models or gradients can leak sensitive information [
8], enabling attacks such as model inversion and membership inference [
9,
10,
11,
12]. Additionally, incorrect aggregation can compromise model integrity and increase privacy risks [
13].
To reduce communication overhead and enhance FL security, chain-based FL schemes have been proposed [
14]. In this scheme, all clients form a single chain, with model parameters transmitted and updated sequentially along it. Only the relay client uploads the aggregated result to the server, limiting server-client communication to a single interaction per round and significantly reducing server-side burden. The chain structure also enhances privacy, as the server cannot directly observe individual client updates. However, this serial process prevents parallelization and increases the wall-clock time per round. In large-scale scenarios with hundreds or thousands of clients, the chain structure introduces a major performance bottleneck. Long chains may also suffer from error accumulation and link instability, limiting applicability in complex or large-scale settings. To address these issues, group-based FL schemes have emerged. In 2022, Zhang et al. introduced G-VCFL (Grouped Verifiable Chained Privacy-Preserving Federated Learning) [
15], grouping clients by neighbor lists so each group forms a short chain, with the last client communicating with the server. While this increases communication, groups can operate in parallel. In 2024, Xia et al. proposed SVCA (Secure and Verifiable Chained Aggregation for Privacy-Preserving Federated Learning) [
16], which adds grouping atop a single chain, uses secret sharing to prevent client dropout, and introduces a commitment mechanism to verify client honesty. That same year, Cui et al. presented MChain-SFFL (Multi-Chain Aggregation Privacy Preserving for Server-Free FL) [
17], which randomly selects multiple chain heads and establishes parallel chains for masked parameter transmission. Wei et al. introduced FedACT (An Adaptive Chained Training Approach for Federated Learning in Computing Power Networks) [
18], an adaptive chained training method that uses computation-driven clustering—clients are clustered based on task processing latency to minimize server wait times.
However, these group-based chained FL schemes still have several limitations. For grouping, G-VCFL adopts the popular construction method for cellular communication environments, based on statistical factors such as the number of occurrences, to select reliable neighbors [
19]. However, when user distribution is uneven or coverage is poor in blind areas, neighbor identification can be severely distorted, further affecting the grouping results. SVCA divides users from different regions into groups, but its strategy is vague and lacks theoretical grounding. MChain-SFFL, a decentralized chained framework, assumes that it has already established reliable neighbor lists among users and maintained stable communication in a peer-to-peer network. However, in practical deployments, peer-to-peer networks may experience dynamic node changes and network instability, which can affect the construction and maintenance of chains [
20]. Regarding server-side security, G-VCFL ensures verifiable privacy-preserving learning via a lightweight pseudorandom generator, while SVCA uses a commitment mechanism for global model verification—though at a high computational cost. MChain-SFFL and FedACT, however, do not address server-side security concerns.
Although existing studies have addressed communication efficiency and security in federated learning from different perspectives, several limitations remain. Verifiable federated learning methods focus on ensuring the correctness of aggregation but introduce additional computational overhead. Group-based approaches improve efficiency by partitioning clients, but they typically lack mechanisms for verifying aggregation results. Chain-based federated learning reduces the number of communications with the server; however, its fully sequential structure leads to increased latency, especially in heterogeneous environments with straggler clients. Therefore, there is still no unified framework that can simultaneously improve communication efficiency, reduce latency, and provide verifiable aggregation. This paper proposes a verifiable chained and grouped federated learning framework, VDCG-FL. It groups clients based on their Euclidean distances to the server because geographically proximate clients often exhibit similar network connectivity characteristics, such as link quality, latency, and available bandwidth. Without introducing excessive communication overhead, VDCG-FL effectively reduces error accumulation in chained federated learning and improves link stability by parallel training across different groups. Meanwhile, a verifiable aggregation mechanism based on Lagrange interpolation on the server side enhances server-side security while reducing the additional computational cost introduced by the verification mechanism. It is worth noting that the grouping in this paper is used to improve communication efficiency, while the verification ensures the correctness of the aggregation results. Together, these two components form a federated learning framework that optimizes communication efficiency and guarantees the correctness of the aggregation results.
The main contributions of this work can be summarized as follows.
(1) Distance-based grouping: VDCG-FL groups clients by their Euclidean distance to the server. This reduces redundant communication and network overhead, while spatially grouping clients helps construct a more stable and efficient communication structure, resulting in more stable intra-group aggregation and avoiding long-distance transmissions. The scheme alleviates system heterogeneity, lowers latency, and improves training efficiency.
(2) Efficient server-side verification: VDCG-FL employs Lagrange interpolation for a server-side aggregation verification strategy, ensuring the correctness of global model parameters. Compared to homomorphic encryption-based schemes, this approach significantly reduces computational overhead, improves efficiency, and enhances the security and performance of aggregation.
(3) Extensive evaluation: We developed a VDCG-FL prototype and conducted experiments on MNIST and CIFAR-10 datasets, both independent and identically distributed datasets (IID) and non-IID, measuring accuracy, latency, and grouping performance. VDCG-FL achieves an accuracy of up to 99.36% on IID data and 98.87% on non-IID data. Furthermore, by dividing clients into 10 groups, the scheme reduces the average round latency from 44.33 s in a single-chain structure to 32.13 s, representing a reduction of approximately 27.5%.
The remaining part of this paper is organized as follows.
Section 2 introduces the preliminary.
Section 3 presents the system design objectives.
Section 4 details the VDCG-FL scheme.
Section 5 analyzes the system in terms of performance, privacy protection, and efficiency.
Section 6 concludes the paper.
3. System Overview
This section briefly introduces some symbols used in VDCG-FL, summarized in
Table 1, the threat model, and the overall framework.
3.1. Threat Model
To meet practical application requirements, the VDCG-FL framework aims to design a federated learning scheme that protects users’ uploaded local models and prevents server-side insecure aggregation based on the following assumptions.
First, the aggregation server may be malicious. It may intentionally return incorrect aggregation results or behave lazily by not performing correct computations, resulting in a global model that deviates from the correct result. This assumption holds practical significance. In real-world implementation scenarios, single points of failure exist. Attackers may compromise the aggregation server to fabricate inaccurate aggregated results or violate users’ privacy.
Second, users participating in FL are assumed to be honest but curious. They correctly train local models and may also attempt to infer private information from other users. This assumption also reflects real-world applications, as users are often curious about others’ information.
3.2. Outline of VDCG-FL
This subsection provides a concise overview of VDCG-FL, with its workflow illustrated in
Figure 1. The VDCG-FL framework involves three entities: a trusted authority (TA), clients, and an aggregation server (AS). The roles of these entities are described below.
The trusted authority (TA): TA is a reliable entity responsible for system initialization. It generates auxiliary sequences for interpolation points and random mask vectors to conceal local models. The TA is assumed trustworthy by default, does not participate in federated training, and never discloses private information—its sole role is generating system parameters.
Clients: In the VDCG-FL framework, clients are grouped according to their Euclidean distance from the server, and model updates are passed sequentially within each group. The relay client handles communication with the server, uploads the group’s aggregated model parameters, and verifies the correctness of the global model parameters from the aggregation server. Grouping clients in this way reduces the number of required interpolation points and ensures only relay clients communicate with the server, lowering interpolation costs and enabling efficient verification.
Aggregation server (AS): AS acts as the central hub, collecting models from each group’s relay client and updating the global model. It aggregates these results and broadcasts the updated model to all groups or clients. The AS does not access clients’ raw data or intra-group transmissions; its role is limited to parameter collection and distribution. Because the AS is potentially untrusted in VDCG-FL, clients must verify its aggregation results.
3.3. Workflow of VDCG-FL
Here is a brief overview of the VDCG-FL workflow.
3.3.1. Initialization Phase
In this phase, the system initializes global parameters. The TA generates a constant sequence to construct the Lagrange interpolation function and mask values to protect local models, and broadcasts these parameters to the clients.
3.3.2. Local Model Training Phase
In VDCG-FL, N clients are partitioned into m groups according to their Euclidean distances to the server. Inside every group, clients are arranged in a chain and train local models on their private datasets. According to the chain sequence, clients sequentially transmit their masked models within the group until the final client receives and aggregates them locally.
3.3.3. Secure Model Aggregation Phase
After receiving the model parameter from relay clients, the server applied FedAvg to compute the global model and broadcast it to each group’s relay client for verification. Each relay client constructs a specific verification function to check the correctness of the global model for its group. Once verification fails, the client refuses the model and halts the current training round. If verification is successful, the client accepts the model.
This section details a description of three phases of VDCG-FL: the initialization phase, the local training phase, and the secure model aggregation phase.
3.4. Initialization Phase
We assume that the system includes N clients, denoted as .
- 1.
The TA assigns a random coordinate vector
to each client in the system. Next, the system computes the Euclidean distance from each client to the server, then sorts and organizes the clients by these distances. Each group forms a chained communication. Assume that the server is located at the origin
, and client
i has the coordinate
. The Euclidean distance is calculated accordingly:
After sorting the clients in ascending order of distance , they are divided into m groups: , , …, . Here denotes the j-th client in the i-th group. The set of all groups is , and each group contains S clients that communicate in a chain. Distance-based grouping has two advantages. First, it reduces intra-group communication latency by organizing geographically close clients together, which helps lower communication costs and avoid long-distance transmissions. Second, it improves overall training efficiency by enabling parallel execution across groups. Compared with a fully sequential chain, the grouping strategy reduces waiting time caused by slow clients and improves training efficiency.
- 2.
After system initialization, the TA randomly selects m distinct scalar points and assigns each group a unique interpolation auxiliary sequence , denoted as and a random number . Then the TA sends these sequences to the relay client of each group.
- 3.
During the initialization phase, the TA sends a random mask to each group to protect the privacy of intra-group aggregation results during upload. Each random mask matches the dimensionality of the global model parameters for subsequent secure aggregation. The mask is sequentially transmitted within the group following the chain until it reaches the relay client. Algorithm 1 provides a more detailed step-by-step procedure.
| Algorithm 1 Initialization of VDCG-FL |
Input: The number of clients N; auxiliary sequence length m Output: Initial global model ; client groups ; auxiliary sequence - 1:
Compute Euclidean distance for each client: - 2:
- 3:
Sort clients according to - 4:
Divide clients into m groups based on distance order - 5:
Initialize the global model - 6:
Generate auxiliary sequence Return , , |
3.5. Local Training Phase
At this stage, VDCG-FL initiates local training for clients in each group. Model parameters are passed and updated sequentially along the chain, with the relay client carrying out intra-group aggregation and communicating with the aggregation server. Algorithm 2 provides a more detailed procedure.
| Algorithm 2 VDCG-FL Training and Secure Aggregation |
Input: Initial model ; training rounds T; learning rate ; groups ; auxiliary sequence ; a random number ; random mask ; loss function ; verification threshold Output: Final global model - 1:
//Local Training phase: - 2:
for
do - 3:
Aggregation server broadcasts - 4:
TA sends random mask to first and relay clients - 5:
for each group do - 6:
for each client do - 7:
Update local model: - 8:
- 9:
Compute masked update: - 10:
- 11:
end for - 12:
Relay client removes mask: - 13:
- 14:
Upload to server - 15:
end for - 16:
//Secure Aggregation phase: - 17:
Compute global aggregation: - 18:
- 19:
Split into m shares and the j-th share can be defined as: - 20:
- 21:
Each receives and holds - 22:
splits into m shares and sends the s-th segment to - 23:
The s-th share can be defined as: - 24:
Each holds - 25:
Each locally calculate reference slices as: - 26:
- 27:
Each constructs two interpolation polynomials: - 28:
- 29:
- 30:
Calculate the values at the verification point : - 31:
if then - 32:
Accept - 33:
else - 34:
Reject the global model and rollback to the - 35:
end if - 36:
end for Return |
- 1.
The aggregation server broadcasts the current global model to all clients when the training starts.
- 2.
Each client
in the group then starts training its local model parameters
with its private dataset
:
where
represents the local model of the client
i in round
t,
is the learning rate;
is the global model in round
t;
is the loss function; and
denotes the gradient of the loss function with respect to the model parameters.
- 3.
After completing local training in each round, clients within a group perform chain-based communication for intra-group aggregation. Taking the first group
as an example. Once the first client
finishes local model training, it computes the masked parameter
as follows:
which is then forwarded along the chain. When the
i-th client receives the masked intermediate value
, it updates the accumulated result as:
and passes it to the next client. This process continues until the relay client
in the group, and the accumulated value reaching the last client is denoted as
:
The relay client then removes the mask by subtracting
to obtain the true intra-group aggregation result. It then uploads this result to the aggregation server for global aggregation.
3.6. Secure Model Aggregation Phase
Once the server receives the locally aggregated models from all groups, it applies FedAvg to compute the global model. After aggregating all local models, the server splits the result into
m shares, denoted as
, and assigns them to the relay client
.
At this point, receives only the segment with its own number; it cannot see other segments or the complete .
At this step,
slices its
according to the receiver number and then sends them to the remaining relay Client. For example,
sends
to
. After finishing the exchange,
holds:
. And
is used to calculate the reference slice
with the segments:
which means if the server aggregates honestly, then the j-th segment of
should be equal to the value of
.
Therefore, assuming the server is honest, the following must be true:
In the next step, each relay client constructs the interpolation polynomials F(x) and G(x):
Then, substitute the verification point into these two equations, calculate and , and check if they are equal. If , then the verification passes; if , verification fails and rollbacks to the .
In summary, the proposed verification mechanism using Lagrange interpolation ensures the integrity of the global aggregation result. Each relay client independently reconstructs a local reference value from locally available information and compares it against the server-distributed result through polynomial evaluation at a randomly chosen verification point generated by the Trusted Authority. Formally, the mechanism is shown to satisfy both completeness and correctness of the aggregation result.
4. Theoretical Analysis
4.1. Analysis of Verification Correctness
Theorem 1. In the VDCG-FL framework, if AS executes the protocol honestly, clients can obtain a correct global model.
Proof. If AS executes the protocol honestly, clients can obtain a correct global model only if .
If AS aggregates the global model honestly:
Each relay client constructs the interpolation polynomials:
where
is the Lagrange basis function.
Then the difference polynomial satisfies
If AS honestly executes the protocol, holds true for all j.
From the above equations, we can conclude that if each entity honestly executes the protocol, the client can obtain the correct aggregated gradients to update the model. □
4.2. Latency Analysis
(a) Communication Latency Analysis without Verification
To better compare this work to FedAvg and Chain-PPFL, we only consider the functionality of grouping; the actual total overhead will be discussed in the next subsection.
In federated learning, the total latency of training round
t consists of three components: (1)
: the latency of local model updates; (2)
: the latency for the aggregation server to compute the global model; (3)
: the communication latency for uploading local models. For ease of analysis, in the FedAvg scheme, we assume that all clients have identical
and
. Under this assumption, the per-round latency of FedAvg is given by
In Chain-PPFL, the total latency of the training round t is composed of three parts: (1) : the latency of local model updates; (2) : the latency for the aggregation server to compute the global model; (3) : the communication latency.
The communication latency
consists of two parts: (1) the cumulative communication latency among clients along the chain
; (2) the latency of uploading the aggregated model to the server
. For simplicity, we assume that the communication latency between two neighboring clients is the same and is denoted as
. Thus, the communication latency of Chain-PPFL can be expressed as
where
K denotes the number of clients participating in each training round. Accordingly, the total latency of the Chain-PPFL scheme is
In the VDCG-FL scheme, the latency of training round
t also consists of three components: (1)
: the latency of local model updates; (2)
: the latency for the aggregation server to compute the global model; (3)
: the communication latency. All participating clients are divided into multiple groups, and different groups can operate in parallel. We assume that the local training latency in Chain-PPFL and VDCG-FL is identical:
Regarding communication latency, we consider a single group as an example. The communication latency of VDCG-FL in round
t, denoted as
, consists of the following: (1)
, the communication latency among clients within the group; (2)
, the latency of uploading the aggregated model to the server. Assuming identical neighbor communication latency
, which is the same as that of Chain-PPFL. So,
where
S denotes the number of clients in each group. Since all groups work in parallel, the total latency is dominated by the slowest group.
Thus, the total latency of VDCG-FL in one training round is given by
Since both Chain-PPFL and VDCG-FL introduce the masking mechanism, the latency for computing local models is leading to
In Chain-PPFL, the aggregation server does not need to compute the sum or average of all individual updates, since aggregation is completed along the chain. In contrast, VDCG-FL requires the server to further aggregate the results from different groups. Therefore, we have
Regarding communication latency, the main bottleneck of Chain-PPFL lies in the long sequential communication path. By dividing clients into multiple groups and enabling parallel execution, VDCG-FL reduces the number of sequential communication operations. As a result, we obtain
We adopt the same latency analysis model as single-chain federated learning to ensure a fair comparison. In practice, system heterogeneity—stemming from differences in computation, hardware, and network conditions—is inevitable and affects communication latency. Nevertheless, the proposed Euclidean distance-based grouping strategy helps reduce communication overhead. Due to spatial correlation, geographically close clients tend to share similar network characteristics (e.g., latency, link quality, bandwidth). Thus, even under a simplified latency model, our approach demonstrates favorable communication performance. Moreover, an appropriate grouping number can reduce system latency without sacrificing model accuracy, striking a desirable balance between communication efficiency and model performance.
(b) Complexity Analysis of the Lagrange-Based Verification Mechanism
In the previous section, we only discussed the impact of grouping on latency. To discuss the overall framework overhead more comprehensively, we focus on verification overhead in this section. Throughout this analysis, d denotes the total dimension of the global model parameters, m denotes the number of groups (equivalently, the number of relay clients), so that each model slice has dimension .
The main source of total overhead for verification mechanisms is the secure segment exchange conducted among all relay clients. Each relay client transmits vectors of dimension to the remaining relay clients, and receives vectors of the same dimension in return. Aggregating across all m relay clients, the total volume of data exchanged is , so the time complexity is .
Also, each relay client splits the and reconstructs the reference slice . Since the dimension of each segment is , the time complexity is: . Similarly, the time complexity of calculating and is also because the dimension of is . In summary, the total per-relay-client complexity is . Therefore, the verification time complexity is independent of the number of groups m. No matter how many groups participate, the time complexity of each client remains O(d).
5. Comparative Experiment
This subsection assesses the performance of VDCG-FL by benchmarking it against FedAvg, Chain-PPFL, and G-VCFL. FedAvg is a widely used baseline that relies on centralized aggregation at the server and is suitable for evaluating federated learning in both IID and non-IID settings. Chain-PPFL exemplifies a chain-based strategy in which clients update their models in sequence, with only the final client communicating with the server. G-VCFL enhances scalability and training efficiency by organizing clients into multiple groups as part of a verifiable federated learning framework. We further examine the impact of different group sizes on VDCG-FL’s accuracy and test its robustness across various levels of data heterogeneity.
5.1. Dataset
To evaluate training accuracy, experiments were carried out on two widely used datasets: MNIST and CIFAR-10. MNIST consists of 70,000 grayscale handwritten digit images with a resolution of . Among them, 60,000 images are used for training and 10,000 for testing. Each image belongs to one of 10 classes labeled 0–9. Thanks to its simple structure and low noise, MNIST is commonly used to evaluate model convergence and stability. CIFAR-10, by comparison, includes 60,000 RGB images of size , with 50,000 for training and 10,000 for testing. Its higher complexity and diverse sample distribution make CIFAR-10 well-suited for assessing model generalization, particularly in Non-IID scenarios.
5.2. Experimental Setup
The experiments employed two well-established neural network architectures: Convolutional neural networks (CNNs) and multilayer perceptrons (MLPs). The CNN model comprises three convolutional layers with 32, 64, and 128 channels, each followed by batch normalization, ReLU activation, and max pooling for effective feature extraction and spatial downsampling. The model’s output layer uses a log-softmax function to enhance numerical stability. The MLP architecture consists of four fully connected layers, incorporating ReLU activations and dropout between layers to boost nonlinearity and reduce overfitting.
For dataset partitioning, MNIST samples are first normalized and then divided into IID and non-IID configurations. In the IID scenario, data are shuffled randomly and evenly distributed among clients. In the non-IID scenario, data are sorted by class labels and split into multiple shards, assigning each client a limited set of classes to simulate non-independent distributions. Experiments with both MLP and CNN models are conducted on the MNIST and CIFAR-10 datasets. All experiments were run on a PC equipped with an AMD Ryzen 7 5800H processor with Radeon Graphics (3.20 GHz) and 16 GB of RAM.
We implemented the VDCG-FL scheme using Python and built neural network models with PyTorch. The federated learning aggregation process follows the FedAvg algorithm. The experimental setup is detailed in
Table 2 below.
5.3. Experimental Results
This section examines how the number of groups affects VDCG-FL, compares the accuracy of VDCG-FL with FedAvg, Chain-PPFL, and G-VCFL, and evaluates VDCG-FL’s performance under varying levels of heterogeneity.
5.3.1. Group Size vs. Model Accuracy
To evaluate the effect of the grouped chain structure on model performance, we conducted experiments on the MNIST dataset using both MLP and CNN models, and compared training accuracy across various group counts. Accuracy is assessed under six conditions: MNIST CNN Non-IID, MNIST CNN IID, MNIST MLP Non-IID, MNIST MLP IID, CIFAR-10 CNN IID, and CIFAR-10 MLP IID. Among them, MNIST CNN Non-IID denotes a CNN-based federated learning approach on the MNIST dataset with non-IID data, and the remaining cases follow similar definitions. In all experiments, the client count is fixed at , with group sizes of 5, 10, 20, 25, and 50. All other hyperparameters remained constant to isolate the impact of grouping on convergence speed and final global model accuracy.
a. Under the Non-IID Setting
We analyze how group size affects model accuracy in a non-IID MNIST setting, seen in
Figure 2. The training process and final convergence performance are reported for both CNN and MLP models with group sizes of 5, 10, 20, 25, and 50.
Figure 2 shows that, under the Non-IID setting, the overall trend of model accuracy for different group sizes increases gradually with the training rounds and becomes stable after approximately 80 to 100 rounds. All group configurations ultimately reach a high accuracy, demonstrating that the grouping strategy does not hinder global model convergence. From
Figure 2a, it can be observed that in the MNIST CNN Non-IID setting, the final test accuracies for different group sizes achieve an accuracy close to 99%. The zoomed-in view further reveals that using 20 or 25 groups yields slightly better accuracy than 5 or 10 groups after convergence. In the MNIST MLP Non-IID setting, group size has a more pronounced impact on model performance. With fewer groups, such as 5 or 10, the model converges more quickly during the initial training phase, but its final accuracy is somewhat lower. Increasing the group count to 20 or 25 improves test accuracy throughout training and results in more stable convergence.
b. Under the IID Setting
We investigate how group size affects model accuracy on the MNIST and CIFAR-10 datasets in the IID scenario.
Figure 3 presents the test accuracy for various group sizes under IID data partitioning. Experiments are performed on both MNIST and CIFAR-10 using two model architectures, CNN and MLP, for comparison. The group sizes evaluated include 5, 10, 20, 25, and 50.
Figure 3a demonstrates that, in the MNIST IID setting, the CNN model converges quickly across all group size configurations, stabilizing after about 20–30 rounds. All group sizes yield a final test accuracy above 99%. The zoomed-in plot reveals only minor differences in accuracy and small fluctuations between group sizes. Configurations with 20 or 25 groups slightly outperform those with 5 or 10 groups after convergence.
Figure 3b illustrates that, in the MNIST MLP IID setting, the model consistently converges quickly for all group sizes, reaching stability after approximately 50 rounds. Group sizes of 10, 20, and 25 yield marginally higher final test accuracy compared to the 5-group configuration, while results with 50 groups are similar to those with intermediate sizes. The MLP model displays slightly greater sensitivity to group-size variations than the CNN model in IID scenarios, though overall accuracy differences remain small.
Figure 3c,d presents results for the CIFAR-10 CNN IID and CIFAR-10 MLP IID experiment. While overall accuracy is lower than on MNIST due to CIFAR-10’s higher complexity, convergence patterns are comparable across group sizes. Group sizes of 10 and 20 achieve slightly better test accuracy in later training, whereas using 5 or 50 groups leads to modestly reduced performance.
The above experimental results demonstrate that the group-based chain federated learning framework shows good adaptability under different data distributions and model architectures. Different group sizes yield unique convergence patterns and affect final accuracy. To pinpoint the most effective group configurations for practical applications, the next set of experiments provides a comprehensive analysis that incorporates system performance metrics such as communication latency. These findings are discussed in the following sections.
5.3.2. Group Size vs. Latency Analysis
This subsection explores the impact of different client group sizes on average round latency. Results are compared with Chain-PPFL, which operates with a single chain, and Chain-PPFL with verification. Unlike this method, VDCG-FL divides clients into several groups and performs chain-based training within each group in parallel, enhancing overall efficiency. Experiments are conducted on the MNIST dataset under both IID and non-IID conditions, using CNN and MLP architectures. The detailed results are shown in
Table 3.
These results compare the average latency of various methods on the MNIST dataset, evaluated under both IID and Non-IID conditions using MLP and CNN models. Compared to Chain-PPFL with verification, our method significantly reduces latency. This is because the Lagrange interpolation verification method used in the chained structure requires verification by each client, increasing computational overhead. Our grouping scheme reduces the number of verifications, thereby reducing latency. In VDCG-FL, clients are organized into groups of 2, 5, 10, 15, 20, 25, or 50. When the number of groups is relatively small (e.g., 2, 5, and 10), VDCG-FL achieves much lower latency than the single-chain Chain-PPFL. For instance, for the MLP model under an IID MNIST setting, VDCG-FL with 10 groups reduces the average per-round latency from 44.33 s to 32.13 s, a decrease of approximately 27.5%. Similarly, under the non-IID MNIST setting, the two-group configuration achieves the lowest per-round latency, reducing it to 29.79 s and outperforming the single-chain approach by approximately 29.6% compared to the single-chain approach. These results indicate that moderate group size with parallel execution effectively shortens the per-round training time and alleviates the straggler effect inherent in serial single-chain participation. However, as the number of groups increases further (e.g., 15, 20, 25, and 50), system latency increases substantially.
In summary, the experimental results demonstrate that the grouping-based parallel mechanism effectively reduces the training latency of federated learning. However, these good performances highly depend on the number of groups. A moderate number of groups (e.g., 2 to 10) provides a favorable trade-off between communication efficiency and model performance. This degradation is mainly due to the fewer clients per group, which results in more frequent synchronization and higher communication overhead, ultimately reducing overall system efficiency.
5.3.3. Comparison with Previous Schemes
To evaluate the classification accuracy of VDCG-FL, we do some experiments on two public datasets, MNIST and CIFAR-10, to compare VDCG-FL with previous classic schemes: FedAvg, Chain-PPFL, and G-VCFL. These experiments are carried out under six situations: MNIST CNN IID, MNIST CNN Non-IID, MNIST MLP IID, MNIST MLP Non-IID, CIFAR-10 CNN IID, and CIFAR-10 MLP IID. Previous experiments indicate that setting the group count to 10 or 20 enables VDCG-FL to achieve superior accuracy and faster convergence compared to other configurations. However, latency analysis shows that the 10-group setup results in much lower latency than the 20-group alternative. To strike an optimal balance between accuracy and latency, we select 10 groups for the comparison experiments with FedAvg, Chain-PPFL, and G-VCFL. In these experiments, the total number of clients is fixed at , with each group consisting of clients.
a. Under the Non-IID Setting
Figure 4 displays the test accuracy of FedAvg, Chain-PPFL, G-VCFL, and the proposed VDCG-FL on the MNIST dataset with Non-IID data, evaluated using both CNN and MLP models. For the CNN model (
Figure 4a), all methods show increasing accuracy as training progresses, but VDCG-FL converges the fastest, stabilizes sooner, and attains the highest final accuracy. This demonstrates VDCG-FL’s strong convergence and robustness. For the MLP model (
Figure 4b), Chain-PPFL and G-VCFL deliver moderate improvements over FedAvg but experience considerable fluctuations early in training. In contrast, VDCG-FL not only converges more quickly but also achieves higher and more stable final accuracy.
b. Under the IID Setting
Figure 5 shows the test accuracy of FedAvg, Chain-PPFL, G-VCFL, and the proposed VDCG-FL on the MNIST and CIFAR-10 datasets under IID conditions, using both CNN and MLP models.
For MNIST (
Figure 5a,b), all four methods achieve rapid convergence and high test accuracy within a few training rounds. VDCG-FL stands out by converging even faster in the early stages and maintaining higher accuracy throughout training. FedAvg and Chain-PPFL display similar trends and final accuracy, while G-VCFL lags initially but eventually catches up.
For CIFAR-10 (
Figure 5c,d), the increased complexity results in lower overall accuracy compared to MNIST, but performance differences among methods are more pronounced. VDCG-FL quickly boosts model accuracy in the early rounds and consistently maintains the highest accuracy in the middle and later stages. In contrast, G-VCFL and Chain-PPFL converge more slowly. These results highlight VDCG-FL’s superior convergence stability and generalization ability, particularly on more challenging datasets.
The above results show that VDCG-FL achieves higher accuracy and more efficient communication than these methods. This indicates that the grouping and verification strategy do not degrade model performance, but instead improve training stability and model accuracy. Grouping clients by distance helps reduce the length of each chain, thereby reducing error accumulation during serial model transmission and improving convergence performance. In addition, the verification mechanism based on Lagrange interpolation does not modify the model parameters but only verifies the correctness of the aggregation results. Therefore, it does not negatively affect model accuracy. These factors together explain why the proposed method can maintain or even improve model accuracy compared with the baseline methods.
5.3.4. Robustness Analysis Under Different Data Heterogeneity Levels
The VDCG-FL scheme organizes clients according to their Euclidean distance from the server. This approach is grounded in the observation that spatially proximate clients typically possess similar communication capacities. Unlike random grouping, distance-based grouping leverages the physical network topology, enabling spatially close clients to form groups. This results in similarity in communication distance within each group and fosters more stable local aggregation.
In federated learning, data heterogeneity is a fundamental challenge that affects model convergence and performance. Although the proposed grouping strategy is designed from a communication perspective, it is necessary to evaluate its effectiveness under different non-IID settings. Therefore, we adopt the Dirichlet-based partitioning strategy to simulate varying degrees of data heterogeneity and assess the robustness of the proposed method. Adjusting the Dirichlet parameter controls the level of heterogeneity: smaller values create more imbalanced data distributions and greater heterogeneity, while larger values approach the IID case. In our experiments, values of , , and represent strong, moderate, and weak heterogeneity, respectively. It should be noted that the proposed distance-based grouping strategy is not designed to reduce data heterogeneity. Instead, the experiments in this section are designed to evaluate the robustness of this approach under different levels of heterogeneity; however, the grouping strategy proposed in this paper is primarily intended to reduce communication overhead on the client side. By grouping clients that are physically close together, the impact of network links on communication latency can be effectively minimized.
Building on the previous experimental results, this section aims to balance latency and accuracy by selecting VDCG-FL with 10 groups for further evaluation.
A. Convergence Performance Comparison under Different Heterogeneity Levels
Figure 6a–f illustrates that both VDCG-FL and Chain-PPFL achieve convergence within a limited number of training rounds across various experimental settings. However, notable performance differences emerge under strong data heterogeneity.
For strong data heterogeneity (), both the CNN and MLP models indicate that Chain-PPFL exhibits significant instability during the first 20 training rounds, with accuracy curves showing pronounced fluctuations. In contrast, the grouping mechanism of VDCG-FL helps mitigate the adverse effects of high heterogeneity, resulting in more stable model training.
When , data heterogeneity is less pronounced, and both methods exhibit more stable convergence than in the scenario. Nevertheless, VDCG-FL continues to demonstrate smoother convergence than Chain-PPFL during the early training phase.
With weak heterogeneity (), the performance gap between the two approaches narrows considerably, and their accuracy curves nearly overlap.
B. Impact of Different Heterogeneity Levels on VDCG-FL
Figure 7a,b shows how varying levels of data heterogeneity affect VDCG-FL’s training process. As the Dirichlet parameter
decreases—signaling increased data heterogeneity—the final convergence accuracy of VDCG-FL declines. Nevertheless, VDCG-FL consistently converges more stably and to higher accuracy than the single-chain structure, even when
, maintaining a clear and stable convergence pattern throughout training.
With increased to , both convergence speed and final accuracy improve further. These findings highlight the adaptability and robustness of the distance-based grouping mechanism across varying degrees of data heterogeneity.
In summary, the experimental results show that VDCG-FL delivers greater training stability and improved convergence compared to Chain-PPFL, especially under strong data heterogeneity. Under weaker heterogeneity, VDCG-FL performs at least as well as Chain-PPFL, without any loss in performance. Although the grouping strategy proposed in this paper is an optimization scheme designed from the perspective of communication latency and cannot directly address the issue of statistical heterogeneity caused by non-iid data, it still exhibits good convergence performance and high accuracy in the data heterogeneity experiments presented in this subsection.