Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering

Zhou, Ying; Cui, Fang; Che, Junlin; Ni, Mao; Zhang, Zhiyuan; Li, Jundi

doi:10.3390/electronics14040745

Open AccessArticle

Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering

by

Ying Zhou

¹

,

Fang Cui

²,

Junlin Che

²,

Mao Ni

²,

Zhiyuan Zhang

^1,*

and

Jundi Li

³

¹

School of Electronic Information Engineering, Beijing Jiaotong University, Beijing 100044, China

²

China Mobile Communications Group Terminal Co., Ltd., Beijing 100053, China

³

Beijing Remote Sensing Equipment Research Institute, Beijing 100143, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(4), 745; https://doi.org/10.3390/electronics14040745

Submission received: 31 December 2024 / Revised: 2 February 2025 / Accepted: 4 February 2025 / Published: 14 February 2025

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Clustered federated learning has garnered significant attention as an effective strategy for enhancing model performance in non-independent and identically distributed (non-IID) data scenarios. This approach improves model performance in such environments by calculating the similarity between users and clustering them into multiple groups. However, several challenges arise when implementing this method, particularly in balancing flexibility, communication costs, and model performance. To address these issues, this paper proposes a novel hierarchical federated learning framework that balances both network and model performance. The framework performs principal component analysis (PCA) on device-side image datasets to assess the similarity of private data across devices and, in conjunction with network performance measurements, dynamically adjusts communication strategies to minimize latency while ensuring stable model performance. By weighting similarity and communication metrics, the framework optimizes communication efficiency without significantly compromising model performance. To validate the proposed method’s effectiveness, we employed three publicly available datasets and compared it against four baseline methods. The experimental results demonstrate that SC-Fed (segmented clustering-federated learning) achieves a maximum accuracy improvement of 7.56% over baseline methods, while also reducing the average waiting time by 54.6%. These results indicate that the proposed algorithm significantly enhances the applicability and efficiency of clustered federated learning in practical training scenarios.

Keywords:

clustered federated learning; data heterogeneity; principal component analysis

1. Introduction

In modern data-intensive fields such as computer vision [1,2], financial quantification [3], and power dispatching [4], federated learning stands out as a pivotal technological framework [5]. The primary advantage of federated learning is its ability to facilitate cross-institutional data sharing, thereby enhancing model generalization while preserving data privacy during collaborative processes [6,7,8]. Nonetheless, practical implementations of federated learning encounter the challenge of data heterogeneity [9,10,11,12] across clients. This heterogeneity encompasses notable variations in network resources, computing power, and data quality. To address these differences, federated learning algorithms must be crafted to strike a balance between resource limitations and performance metrics, which include communication overhead, computational efficiency, and prediction accuracy of the models. Specifically, these algorithms should be designed to optimize communication costs and computation time without compromising model performance, thereby accommodating the diverse resource conditions of various clients. To address these disparities, FL algorithms need to be designed to achieve a balance between resource constraints and performance metrics, which include but are not limited to communication overhead, computational efficiency, and model prediction accuracy. Specifically, FL algorithms must optimize communication costs and computation time, while ensuring model performance, to adapt to the resource conditions of different clients.

Clustered federated learning [13,14,15,16] has been demonstrated as an effective approach for managing non-independent and identically distributed (non-IID) data [17,18,19,20]. By clustering users based on their characteristics, this method allows devices with similar features to train within the same cluster, effectively adapting to the heterogeneity in computing and communication resources. Most current research focuses on optimizing single performance metrics, utilizing diverse model segmentation schemes and user selection strategies [21,22,23,24]. However, under multifactor conditions, existing federated learning algorithms lack quantitative and theoretical modeling of various metrics, making it challenging to balance model performance, communication efficiency, and privacy security [25,26,27,28].

The clustered federated learning problem in this context can be formulated as an optimization problem, with particular emphasis on balancing computational time and model performance [29,30]. An excessive number of clusters can reduce the flexibility of users within a cluster, while too few clusters may lead to improper partitioning of device data, adversely affecting model performance. This paper presents a theoretical model and designs an efficient and flexible clustered federated learning framework [31]. The framework ensures an elastic trade-off between model performance and training efficiency, enabling the determination of an optimal user grouping strategy within a limited time frame, thereby maintaining effectiveness in real-world federated learning environments.

Building on the federated clustered learning (FCL) framework [32,33,34], this paper introduces an enhanced federated learning approach that balances model performance with network communication capabilities. This approach utilizes a hierarchical clustering method for initial classification based on the data characteristics of user devices, aiming to mitigate the effects of non-IID data. Moreover, it incorporates an active selection strategy to balance communication and computation times, and adjusts batch sizes to ensure the fairness and effectiveness of user data in training [35,36,37,38]. Experimental results indicate that this method reaches the upper performance limit of the model more quickly and significantly reduces overall model training time compared to other federated learning baseline methods [5]. Specifically, the main contributions of our approach are as follows:

By employing theoretical modeling that encompasses various factors such as non-IID data, network conditions, and computing resources, we have designed a simulation theoretical framework grounded in FCL.
Building on clustered federated learning, this paper introduces a flexible hierarchical clustering-based federated learning method that optimizes the average training time per round without significantly impacting model accuracy. The method dynamically adjusts cluster weights based on an elastic weighting factor, adaptively balancing model performance and training efficiency, thereby ensuring the flexibility required for real-world federated learning applications.
Based on the theoretical framework and algorithm, this paper conducts comparative experiments with four baseline algorithms on three datasets. The experimental results demonstrate that our algorithm achieves higher model performance within the same time frame, while significantly reducing the average waiting time per training round.

The structure of the remaining sections of this paper is as follows: Section 2 presents an overview of existing methods. Section 3 provides a mathematical formulation of the hierarchical federated clustering algorithm in the non-IID scenario. Section 4 introduces our proposed solution, which balances communication efficiency and model accuracy. Section 5 validates the effectiveness of the method through experiments and compares it with several baseline algorithms. Section 6 summarizes the main contributions of the paper.

2. Related Work

While federated learning can address the issue of user data leakage during model training, the effectiveness of the training is closely tied to the distribution of user datasets. When user data is highly non-IID, the resulting model accuracy can be significantly compromised. To tackle this issue, researchers have proposed various methods such as FedAvg [39], FedProx [40], and SCAFFOLD [41]. Additionally, clustered federated learning (CFL) has also been shown to be an effective approach for addressing the challenges posed by non-IID data distributions.

Additionally, the computational and communication capabilities of user devices constrain model training, leading to a pronounced lag effect when resources are unevenly distributed. Many studies have addressed this issue by employing model compression techniques to adapt to the computational and communication capacities of individual user devices, thereby mitigating the lag effect. Notable examples include HeteroFL and FLANC, which aim to optimize training efficiency by tailoring models to the specific resource constraints of each device.

2.1. Personalized Federated Learning

Non-IID federated learning (FL) tackles the challenge of statistical heterogeneity across clients. The FedAvg algorithm [39], which employs a three-step protocol for federated training, often encounters issues such as client drift and slow convergence on non-IID data [42]. To address these, algorithms like FedProx [40] and FedDANE [43] have been developed. FedProx incorporates a proximal term in the local loss function to mitigate model heterogeneity, while FedDANE employs a federated Newton-type optimization method, adapting traditional distributed optimization for federated learning [44].

SCAFFOLD [41] and LFD [45] represent further advancements in addressing heterogeneity. SCAFFOLD aims to control variance and correct client update drift, and LFD extends stochastic full gradient descent with periodic averaging to manage non-convex optimization challenges in federated learning environments [46].

CFL [47] introduces a dynamic clustering approach, grouping clients based on the cosine similarity of their updates, which can enhance performance in diverse settings. In contrast, the IFCA algorithm [48] utilizes a fixed number of clusters, requiring clients to download all model options each round and select the best based on local accuracy.

2.2. Device Resources

Mobile and IoT devices are increasingly becoming the main computing resources for billions of users globally. These devices produce significant amounts of data that can be used to improve various applications [49]. With their growing computational power, the volume of locally stored data also rises. To ensure data privacy, FL is crucial for model training. FL [39,50] is a distributed machine learning framework that creates a global model by aggregating locally trained parameters without sharing local data. Addressing the diverse and dynamic computational and communication capabilities of heterogeneous clients is essential in this context.

However, the distribution of computational and communication resources among user devices is often uneven. Mobile devices [51], for example, have fewer resources compared to computers or servers, which have significantly more resources. Traditional FL involves sending local models with the same architecture as the global model to users for training, and then aggregating these local models on the server to form a single global model. In this approach, devices with varying resources receive identical local models for training. Due to the significant differences in computational resources, this can result in noticeable lag and decreased training efficiency.

To address the heterogeneity of user devices, Diao proposed HeteroFL [52], which adapts to varying computational and communication capacities by splitting the global model into several sub-models of different sizes. These heterogeneous local models enable local clients to contribute adaptively to the training of the global model, effectively addressing system heterogeneity and improving communication efficiency. Building on this, Mei introduced the FLANC algorithm [53], which employs low-rank decomposition to transform the model into two low-rank matrices, further reducing the resources required for model transmission by only transmitting these matrices instead of the entire model architecture.

3. Problem Statement

The set of clients is denoted as

C = {c_{1}, c_{2}, \dots, c_{N}}

, managed by a central server. Each client

c_{i} \in C

possesses its local dataset

D_{i} = {η_{j}}_{j = 1}^{|D_{i}|}

, where

η_{j}

represents a data sample from

D_{i}

. The loss function

L (θ; η)

evaluates the performance of the model

θ

on a data sample

η

. Therefore, the expected value of the loss function

L (θ; η)

over the data distribution

D_{i}

of client

c_{i}

is defined as Equation (1):

F_{D} (θ) = E_{η \sim D_{i}} [L (θ; η)]

(1)

For the global model, the loss function is a linear combination of the local loss functions of all

N

clients. The objective of FL is to train a high-quality model

θ^{★}

by minimizing the global loss function, which is defined as Equation (2):

θ^{★} : = \arg min_{θ} F (θ) = \arg min_{θ} (\frac{1}{N} \sum_{c = 1}^{N} F_{c} (θ))

(2)

However, not all users participate in training in each round of federated learning. Suppose there are a total of H rounds of training. In each training round

h \in {1, 2, \dots, H}

, the parameter server (PS) randomly selects a subset of clients

K_{h} \subseteq K

to participate in the training, where

| K_{h} | = k

. The PS then sends the current global model

θ_{h}

to the selected clients. Each client

n \in K_{h}

updates the global model

θ_{h}

over its local dataset

D_{n}

for T iterations, with each update constituting one local iteration and T representing the local update frequency. Let

θ_{(i, h)}^{(t)}

denote the local model of client i at iteration t in round h. For the mini-batch stochastic gradient descent (SGD) algorithm, a local iteration can be expressed as Equation (3):

θ_{h}^{(t + 1)} = θ_{h}^{(t)} - λ F_{c} (θ_{(i, h)}^{(t)})

(3)

where

λ

is the learning rate. Finally, the PS collects the gradients from the participating clients and aggregates them into the latest global model for further training, as expressed by Equation (4):

θ_{h + 1} = \frac{1}{| K_{h} |} \sum_{i \in K_{h}} θ_{(i, h)}^{(t)}

(4)

As shown in Equation (5), to obtain the characteristics of user datasets, this paper uses principal component analysis (PCA) for dimensionality reduction to acquire the main information vectors r

S_{i} = {v_{1}, \dots, v_{r}}

. The experiment uses cosine distance to measure the similarity between user data.

M_{i j} = arccos (\frac{S^{i} \cdot S^{j}}{∥ S^{i} ∥ ∥ S^{j} ∥})

(5)

Assuming there are n users in cluster i, the average cosine distance

M_{i}

of this cluster is the average of the distances between each pair of these n users.

Evidently, an idealized federated learning system is not well suited for real-world environments. The heterogeneity of user devices, including variations in network conditions and computational capabilities, can impact the efficiency of FL training. Therefore, it is imperative to incorporate system performance parameters into the model. Assuming that the resources consumed for one local training iteration of the model are G and the computational capability of the client i is

u_{i}

, the time required for one local iteration in round h for the client i is given by Equation (6):

t_{i}^{h} = \frac{G (θ_{i}^{h})}{u_{i}^{h}}

(6)

In practical network environments, download bandwidth is typically significantly higher than upload bandwidth. Therefore, the time required to download the model is usually negligible, whereas the time required to upload the model is more significant. Let

b_{i}

denote the upload bandwidth of user i, and let

P_{i}^{h}

represent the model size of client i in round h. Consequently, the communication time for client i in round h can be expressed as Equation (7):

y_{i}^{h} = \frac{P_{i}^{h}}{b_{i}^{h}}

(7)

Let

τ_{i}^{h}

denote the update frequency of client i in round h. A round of parameter updates requires

τ_{i}^{h}

instances of local training and one data upload. The total time for each user to complete a round of parameter updates is defined as Equation (8):

T_{i}^{h} = τ_{i}^{h} t_{i}^{h} + y_{i}^{h}

(8)

In a given training round, assume there are n users in the cluster

C_{i}

and their update frequency is

{τ_{1}^{h}, \dots, τ_{n}^{h}}

. For users participating in a round of model training, their average waiting time after completing the model training is calculated by Equation (9):

W_{i} = \frac{1}{n} \sum_{i = 1}^{n} (T_{\max} - τ_{i}^{h} \cdot y_{i}^{h} - t_{i}^{h})

(9)

T_{\max} = {max}_{i = 1}^{n} (τ_{i}^{h} \cdot y_{i}^{h} + t_{i}^{h})

Based on Equations (5) and (9), we have adopted a method grounded in Pareto optimality theory, with the goal of simultaneously minimizing the average cosine distance M and the average waiting time

W

among users. As shown in Equation (10), we have constructed a multi-objective optimization model to seek the optimal solution by reasonably clustering users.

min_{θ \in R} (w_{1} M + w_{2} W)

(10)

st . \{\begin{matrix} w_{1} ≫ w_{2} \\ T^{h} \geq 0 \\ F_{D} (θ) \geq 0 \end{matrix}

4. Overview and Implementation

In this chapter, we will introduce the algorithm content of this paper in three sections. The first section provides an overview of the algorithm, the second section details the algorithm process, and the third section presents the pseudocode of the algorithm.

4.1. Overview

The overall process of SC-Fed is illustrated in Figure 1, which consists of two main parts: the model training on the client side and the model aggregation on the server side, as detailed below.

To mitigate the adverse impact of highly non-IID data on model performance, this study employs an innovative approach: utilizing PCA to extract key features from client data and constructing feature matrices. By calculating the cosine similarity between feature matrices, we can quantify the similarity of client data and cluster clients based on these similarities.

Moreover, the experiment must account for the lag effect in model training caused by the heterogeneity of client resources. Clients with ample resources may experience prolonged idle times while waiting for resource-constrained clients, thereby affecting overall training efficiency. To address this issue, this paper proposes a client resource modeling method. By modeling each client’s resource status, the server can allocate appropriate update frequencies, balancing the model training time across clients and thereby reducing the average waiting time.

4.2. Model Training Process

Feature vector extraction based on PCA is a reliable and effective method. PCA performs coordinate transformation on matrix data, enabling dimensionality reduction of image matrices without loss, identifying the largest set of linearly independent vectors. Specifically, for a set of image data arranged row-wise, PCA can extract the r most representative principal component vectors, encapsulating the distribution and features of the images. By calculating the cosine similarity, the maximum similarity value between vectors can be obtained, constructing a similarity matrix. Based on matrix analysis, the optimal classification set

S = {s_{1}, \dots, s_{k}}

for any number of clusters k can be determined, where s is a set constructed from any number of c. This method aids in secondary clustering of subsequent system performance parameters.

To allocate an appropriate update frequency for each client, it is necessary to provide the server with information on the client’s computing and transmission resources. First, clients measure their network transmission rate using network measurement methods and determine their computing power by querying the local CPU and GPU usage. Second, clients upload their local computing and transmission resource data to the server. The server will calculate the model training frequency for client i in the current round based on the objective function described in Section 3, and determine the estimated completion time

m_{i}

for the corresponding client. However, the data uploaded by the clients to the server may not be entirely accurate, and the clients’ computing resources

u_{i}

and transmission resources

b_{i}

are dynamically changing. It is necessary for the server to adjust the relevant data of the clients accordingly.

4.3. Implementation and Algorithm Description

As shown in Algorithm 1, for each user, use PCA to extract the top k principal component vectors

S_{i}

from the user’s data

D_{i}

, and transmit them to the server (stp. 1). Measure the similarity between user data features

S_{i}

using cosine distance and construct a similarity matrix M (stp. 2). Each user uploads their communication resource

b_{i}

and computation resource

u_{i}

to the server (stp. 3). Determine the optimal clustering

O_{p}

based on the similarity matrix M, communication resource

b_{i}

, and computation resource

u_{i}

(stp. 4). In each training round, clients train their models based on the update frequency (stp. 5). At the end of each training round, the server collects client models for aggregation (stp. 6).

Algorithm 1 PCA-based clustering algorithm with resource optimization

Require: Initial Model Parameters

θ_{0}

, Learning Rate

λ

, Communication Rounds T, Number of Principal Components r, Clustering Number k

1:: for each client $c_{i}$ , $i \in {1, 2, \dots, n}$ do
2:: $S_{i} \leftarrow P C A (D_{i}, r)$ // Extract the main features of client data
3:: $S e r v e r \leftarrow S_{i}, u_{i}, b_{i}$ // Upload $S_{i}$ , $b_{i}$ and $u_{i}$ to the server
4:: end for
5:: $M_{i j} = \frac{S_{i} \cdot S_{j}}{∥ S_{i} ∥ ∥ S_{j} ∥}$ // Calculate cosine similarity matrix
6:: $O_{p} \leftarrow M_{i j}, u_{i}, b_{i}$ // Obtain clustering groups $O_{p}$ .
7:: Parallel for each client clustering group $O_{p}$
8:: for $e \leftarrow 0$ to $T - 1$ do
9:: $S_{i} \leftarrow P C A (D_{i}, r)$ // Extract the main features of client data
10:: $C l i e n t : θ_{h}^{(t + 1)} = θ_{h}^{(t)} - λ F_{c} (θ_{(i, h)}^{(t)})$ // Update model parameters ${\hat{θ}}_{n} (t)$
11:: $S e r v e r : θ_{h + 1} = \frac{1}{| K_{h} |} \sum_{i \in K_{h}} θ_{(i, h)}^{(t)}$ //Model Aggregation
12:: end for
13:: end parallel

4.4. Datasets and Settings

This research focuses on image classification using the Fashion-MNIST [54], CIFAR-10 [55], and EMNIST [56] datasets to evaluate the proposed approach. Non-IID distributions are simulated by varying the Dirichlet distribution parameter (

α = 0.1

,

α = 1

,

α = 10

) with 100 users. PCA is employed to reduce the dimensionality of user datasets and extract key features. After validation, the number of principal components is set to three to balance information retention and computational efficiency.

This research focuses on image classification using the Fashion-MNIST [54], CIFAR-10 [55], and EMNIST [56] datasets to evaluate the proposed approach. Non-IID distributions are simulated by varying the Dirichlet distribution parameter (

α = 0.1

,

α = 1

,

α = 10

) with 100 users. Here, smaller

α

values increase data heterogeneity and non-IID nature, while larger

α

values make the distribution more uniform and closer to IID. PCA is employed to reduce the dimensionality of user datasets and extract key features. After validation, the number of principal components is set to three to balance information retention and computational efficiency.

To assess the efficacy of our SC-Fed method against state-of-the-art approaches, we conducted comparisons with various benchmark models. In scenarios involving a unified global model trained across all clients, we chose FedAvg [39] and Solo as our baselines. For more sophisticated personalization in federated learning, we compared our method against Ditto [57] and FedCom [58].

5. Experiment

In our experiments, we configured a setup with 20 users and varied the data distribution skew using the Dirichlet distribution with concentration parameters

α = 0.1

,

α = 1

, and

α = 10

to simulate different levels of non-IID data. For all experiments, we used a gradient batch size of 128 and employed stochastic gradient descent (SGD) as the optimizer for local training.

We have evaluated SC-Fed against the aforementioned state-of-the-art benchmarks under the prevalent non-IID conditions characterized by Dirichlet label distribution. These conditions are designed to assess the resilience of the algorithms in scenarios involving data augmentation. We have documented the results from five experimental runs that cover both the validation precision of the ultimate user models across each dataset and the mean of these precisions.

As shown in Table 1, the quantitative experimental results demonstrate that, across three varying degrees of non-IID settings on the CIFAR-10 dataset, the SC-Fed algorithm achieved average performance improvements of 7.53%, 7.09%, and 6.33% compared to the Solo, FedAvg, Ditto, and FedCom algorithms, respectively. On the Fashion-MNIST dataset, SC-Fed exhibited respective performance gains of 1.25%, 3.84%, and 1.20% under three distinct non-IID scenarios. Meanwhile, on the EMNIST dataset, SC-Fed achieved performance enhancements of 0%, 1.85%, and 1.60% across the three non-IID settings. The observations indicate that the performance improvement of the SC-Fed algorithm is relatively modest. This is primarily attributed to the optimization of communication efficiency while maintaining model performance, thereby providing robustness in communication performance for federated learning in real-world network scenarios.

Figure 2a presents the experimental results on the CIFAR-10 dataset under the highly non-IID setting (

α = 0.1

). The proposed SC-Fed method shows steady improvement, surpassing FedAvg, FedCom, and Solo, and achieving a final accuracy of approximately 82%. Before 50 communication rounds, Ditto performs worse than SC-Fed; however, it surpasses SC-Fed in accuracy after 50 rounds. The Solo method achieves the best performance initially, but its accuracy improves very slowly afterward, resulting in the worst final performance. FedCom consistently performs the worst throughout the training process. In Figure 2b, SC-Fed outperforms FedAvg and FedCom throughout the entire training process. Solo achieves the best performance in the early stages of training, but its model accuracy shows no significant improvement after 40 rounds, resulting in the worst final performance. SC-Fed achieves similar results to Ditto, although Ditto performs better in the early stages of training. In Figure 2c, SC-Fed performs similarly to FedAvg, while Ditto and FedCom achieve the best results. However, the Solo method continues to exhibit the worst performance.

Figure 2d–f present the experimental results on the EMNIST dataset under different non-IID settings (

α = 0.1

,

α = 1

, and

α = 10

). Under the highly non-IID setting (

α = 0.1

), shown in Figure 1, the proposed SC-Fed method demonstrates steady improvement, achieving a final accuracy of approximately 87%, comparable to Ditto. Solo performs the best before 30 communication rounds but shows a very slow improvement afterward, stabilizing at 78%, the lowest among all methods. FedAvg and FedCom achieve around 80% accuracy. In the moderately non-IID setting (

α = 1

), shown in Figure 2e, SC-Fed reaches a final precision of approximately 80%, similar to Ditto, FedAvg and FedCom. Solo performs best during the first 20 rounds but stabilizes at 75%. Finally, under the nearly IID setting (

α = 10

), shown in Figure 2c, SC-Fed, Ditto, FedAvg, and FedCom achieve similar final performances, converging at around 78%. Solo shows competitive performance in the early stages, achieving the highest accuracy within the first 20 rounds, but stabilizes at 72%. These results highlight the robustness and effectiveness of SC-Fed and Ditto across varying degrees of data heterogeneity.

The Figure 2g presents the experimental results on the FMNIST dataset under the highly non-IID setting (

α = 0.1

). Solo and Ditto achieve the best performance throughout the entire training process, while SC-Fed performs well. In contrast, FedCom and FedAvg exhibit the worst performance with significant fluctuations. Figure 2h presents the experimental results on the FMNIST dataset under the moderately non-IID setting (

α = 1

). SC-Fed and FedCom achieve similar final performances, both converging at approximately 85%. Solo performs best during the first 20 communication rounds, reaching the highest early accuracy, but improves only marginally afterward, stabilizing at around 82%. FedAvg converges to around 80%. Figure 2i presents the experimental results on the FMNIST dataset under the nearly IID setting (

α = 10

). SC-Fed, FedCom, and FedAvg achieve nearly identical final performances, converging at approximately 83%. Solo shows competitive performance in the early stages, but stabilizes at around 80% in the later stages, while Ditto remains the best effective, stabilizing at 87%.

The Figure 3a–c present the experimental results on the three datasets under non-IID settings (

α = 0.1

). In CIFAR-10 of Figure 3a, SC-Fed outperforms all baseline algorithms with the smallest average waiting time and the most stable performance. In EMNIST of Figure 3b, SC-Fed maintains a consistently lower average waiting time compared to the fluctuating results of FedAvg, FedCom, and Ditto. In FMNIST of Figure 3c, SC-Fed not only achieves the lowest waiting time but also demonstrates minimal variance, highlighting its robustness and efficiency. These results indicate the significant advantage of the SC-Fed algorithm in reducing communication delays and improving training efficiency in federated learning with heterogeneous data distributions.

6. Conclusions

To address the requirements of real-world federated learning environments, this paper proposes a balanced federated learning framework, SC-Fed, which leverages federated clustering to optimize model performance and training latency, effectively mitigating the non-IID problem in practical federated learning scenarios. Our approach aims to achieve elastic optimization of the average waiting time without significantly compromising model performance. Specifically, we validated the effectiveness of the algorithm using three publicly available datasets (EMNIST, FashionMNIST, and CIFAR10). The results show that the performance of the model of our federated learning framework exceeds the baseline algorithms by an average of 7.56%, while reducing the waiting time by 54.6%. This indicates that the framework for the federated learning algorithm is of practical significance in heterogeneous device environments.

Although the method shows promising results, we acknowledge certain unresolved challenges. The federated clustering framework based on PCA has been validated in terms of performance, but in pathological non-IID scenarios, the PCA-based clustering approach for user-device similarity still leaves room for improvement, as it does not capture shared knowledge across models but focuses solely on similarity-based knowledge. To address these challenges, future work will explore advanced PCA techniques such as incremental PCA (IPCA) and robust PCA (RPCA) to better handle data variability and noise, investigate hybrid clustering approaches that combine PCA with other methods like spectral clustering or hierarchical clustering to enhance clustering performance, and explore mechanisms for cross-model knowledge transfer using meta-learning or transfer learning to improve the generalization capability of the federated clustering framework. Furthermore, we will develop adaptive algorithms to dynamically adjust the local training frequency based on device computational load and data quality, thereby reducing the average waiting time.

Author Contributions

Data curation, Y.Z. and Z.Z.; formal analysis, Y.Z. and Z.Z.; investigation, Y.Z.; methodology, Z.Z. and Y.Z.; supervision, Z.Z.; visualization, Y.Z. and Z.Z.; writing—original draft, Y.Z. and Z.Z.; writing—review and editing, Y.Z., F.C., J.C., M.N., J.L. and Z.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Research on AI Terminal Computing Power and End-Cloud Framework (R2411B0R), the Research on Terminal Computing Power Key Technology (Terminal Perspective) (R24113AC), and the National Key Research and Development Program of China (2023YFB2904205).

Data Availability Statement

The data can be shared upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Non-IID	Non-independent and identically distributed
PCA	Principal component analysis
FCL	Federated clustered learning
FL	Federated learning
CFL	Clustered federated learning
PS	Parameter server
SGD	Stochastic gradient descent

References

Chuprov, S.; Bhatt, K.M.; Reznik, L. Federated Learning for Robust Computer Vision in Intelligent Transportation Systems. In Proceedings of the 2023 IEEE Conference on Artificial Intelligence (CAI), Santa Clara, CA, USA, 5–6 June 2023; pp. 26–27. [Google Scholar] [CrossRef]
Ye, M.; Fang, X.; Du, B.; Yuen, P.C.; Tao, D. Heterogeneous federated learning: State-of-the-art and research challenges. ACM Comput. Surv. 2023, 56, 1–44. [Google Scholar] [CrossRef]
Xiong, Z.; Zhu, H.; Liu, D.; Xian, J. Industrial Chain Data Evaluation in Automobile Parts Procurement via Group Multirole Assignment. In Proceedings of the 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; pp. 1049–1054. [Google Scholar] [CrossRef]
Lu, W.; Hu, W.; Kong, X.; Shen, Y.; Zhao, X. Multi-Objective Optimal Dispatch of Responsibility-Assignment Market via Federated Learning. In Proceedings of the 2022 25th International Conference on Electrical Machines and Systems (ICEMS), Chiang Mai, Thailand, 29 November–2 December 2022; pp. 1–5. [Google Scholar] [CrossRef]
Huang, W.; Ye, M.; Shi, Z.; Wan, G.; Li, H.; Du, B.; Yang, Q. Federated learning for generalization, robustness, fairness: A survey and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9387–9406. [Google Scholar] [CrossRef]
Feng, W.; Liu, H.; Peng, X. Federated Reinforcement Learning for Sharing Experiences Between Multiple Workers. In Proceedings of the 2023 International Conference on Machine Learning and Cybernetics (ICMLC), Adelaide, Australia, 9–11 July 2023; pp. 440–445. [Google Scholar] [CrossRef]
Zhao, Y. Comparison of Federated Learning Algorithms for Image Classification. In Proceedings of the 2023 2nd International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI), Zakopane, Poland, 17–19 October 2023; pp. 613–615. [Google Scholar] [CrossRef]
Wijethilaka, S.; Liyanage, M. A Federated Learning Approach for Improving Security in Network Slicing. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 915–920. [Google Scholar] [CrossRef]
Novikova, E.S.; Chen, Y.; Meleshko, A.V. Evaluation of Data Heterogeneity in FL Environment. In Proceedings of the 2024 XXVII International Conference on Soft Computing and Measurements (SCM), Saint Petersburg, Russia, 22–24 May 2024; pp. 344–347. [Google Scholar] [CrossRef]
Siriwardana, G.K.; Jayawardhana, H.D.; Bandara, W.U.; Atapattu, S.; Herath, V.R. Federated Learning for Improved Automatic Modulation Classification: Data Heterogeneity and Low SNR Accuracy. In Proceedings of the 2023 Moratuwa Engineering Research Conference (MERCon), Moratuwa, Sri Lanka, 9–11 November 2023; pp. 462–467. [Google Scholar] [CrossRef]
Chen, J.; Yan, H.; Liu, Z.; Zhang, M.; Xiong, H.; Yu, S. When federated learning meets privacy-preserving computation. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
Wang, A.; Feng, Y.; Yang, M.; Wu, H.; Iwahori, Y.; Chen, H. Cross-Project Software Defect Prediction Using Differential Perception Combined with Inheritance Federated Learning. Electronics 2024, 13, 4893. [Google Scholar] [CrossRef]
Gauthier, F.; Gogineni, V.C.; Werner, S.; Huang, Y.F.; Kuh, A. Clustered Graph Federated Personalized Learning. In Proceedings of the 2022 56th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 31 October–2 November 2022; pp. 744–748. [Google Scholar] [CrossRef]
Cai, L.; Chen, N.; Wei, Y.; Chen, H.; Li, Y. Cluster-based Federated Learning Framework for Intrusion Detection. In Proceedings of the 2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 25–27 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, J.; Qiao, Z. SoFL: Clustered Federated Learning Based on Dual Clustering for Heterogeneous Data. Electronics 2024, 13, 3682. [Google Scholar] [CrossRef]
Zhang, J.; Shi, Y. A Personalized Federated Learning Method Based on Clustering and Knowledge Distillation. Electronics 2024, 13, 857. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, J.; Hong, W.; Quek, T.Q.S.; Ding, Z.; Peng, M. Ensemble Federated Learning With Non-IID Data in Wireless Networks. IEEE Trans. Wirel. Commun. 2024, 23, 3557–3571. [Google Scholar] [CrossRef]
Chen, Y.A.; Chen, G.L. An Adaptive Clustering Scheme for Client Selections in Communication-Efficient Federated Learning. In Proceedings of the 2023 VTS Asia Pacific Wireless Communications Symposium (APWCS), Tainan City, Taiwan, 23–25 August 2023; pp. 1–3. [Google Scholar] [CrossRef]
Li, Y.; Wang, S.; Chi, C.Y.; Quek, T.Q.S. Differentially Private Federated Clustering Over Non-IID Data. IEEE Internet Things J. 2024, 11, 6705–6721. [Google Scholar] [CrossRef]
Zhou, X.; Ye, X.; Kevin, I.; Wang, K.; Liang, W.; Nair, N.K.C.; Shimizu, S.; Yan, Z.; Jin, Q. Hierarchical federated learning with social context clustering-based participant selection for internet of medical things applications. IEEE Trans. Comput. Soc. Syst. 2023, 10, 1742–1751. [Google Scholar] [CrossRef]
Zuo, Y.; Liu, Y. User Selection Aware Joint Radio-and-Computing Resource Allocation for Federated Edge Learning. In Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 21–23 October 2020; pp. 292–297. [Google Scholar] [CrossRef]
Kim, J.; Song, C.; Paek, J.; Kwon, J.H.; Cho, S. A Review on Research Trends of Optimization for Client Selection in Federated Learning. In Proceedings of the 2024 International Conference on Information Networking (ICOIN), Ho Chi Minh City, Vietnam, 17–19 January 2024; pp. 287–289. [Google Scholar] [CrossRef]
Huang, W.; Ye, M.; Shi, Z.; Li, H.; Du, B. Rethinking federated learning with domain shift: A prototype view. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16312–16322. [Google Scholar]
Zhao, X.; Xie, P.; Xing, L.; Zhang, G.; Ma, H. Clustered federated learning based on momentum gradient descent for heterogeneous data. Electronics 2023, 12, 1972. [Google Scholar] [CrossRef]
Li, H.; Li, C.; Wang, J.; Yang, A.; Ma, Z.; Zhang, Z.; Hua, D. Review on security of federated learning and its application in healthcare. Future Gener. Comput. Syst. 2023, 144, 271–290. [Google Scholar] [CrossRef]
Ye, R.; Wang, W.; Chai, J.; Li, D.; Li, Z.; Xu, Y.; Du, Y.; Wang, Y.; Chen, S. Openfedllm: Training large language models on decentralized private data via federated learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6137–6147. [Google Scholar]
Beltrán, E.T.M.; Pérez, M.Q.; Sánchez, P.M.S.; Bernal, S.L.; Bovet, G.; Pérez, M.G.; Pérez, G.M.; Celdrán, A.H. Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges. IEEE Commun. Surv. Tutor. 2023, 25, 2983–3013. [Google Scholar] [CrossRef]
Chen, H.; Chen, X.; Peng, L.; Ma, R. FLRAM: Robust Aggregation Technique for Defense against Byzantine Poisoning Attacks in Federated Learning. Electronics 2023, 12, 4463. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
Kumar, K.N.; Mohan, C.K.; Cenkeramaddi, L.R. The Impact of Adversarial Attacks on Federated Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2672–2691. [Google Scholar] [CrossRef] [PubMed]
Guduri, M.; Chakraborty, C.; Maheswari, U.; Margala, M. Blockchain-Based Federated Learning Technique for Privacy Preservation and Security of Smart Electronic Health Records. IEEE Trans. Consum. Electron. 2024, 70, 2608–2617. [Google Scholar] [CrossRef]
Yazdinejad, A.; Dehghantanha, A.; Srivastava, G. AP2FL: Auditable privacy-preserving federated learning framework for electronics in healthcare. IEEE Trans. Consum. Electron. 2023, 70, 2527–2535. [Google Scholar] [CrossRef]
Zhou, X.; Liang, W.; Kevin, I.; Wang, K.; Yan, Z.; Yang, L.T.; Wei, W.; Ma, J.; Jin, Q. Decentralized P2P federated learning for privacy-preserving and resilient mobile robotic systems. IEEE Wirel. Commun. 2023, 30, 82–89. [Google Scholar] [CrossRef]
Fu, Y.; Li, C.; Yu, F.R.; Luan, T.H.; Zhao, P. An incentive mechanism of incorporating supervision game for federated learning in autonomous driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14800–14812. [Google Scholar] [CrossRef]
Li, Z.; He, S.; Chaturvedi, P.; Hoang, T.H.; Ryu, M.; Huerta, E.A.; Kindratenko, V.; Fuhrman, J.; Giger, M.; Chard, R.; et al. APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service. In Proceedings of the 2023 IEEE 19th International Conference on e-Science (e-Science), Limassol, Cyprus, 9–13 October 2023; pp. 1–4. [Google Scholar] [CrossRef]
El Houda, Z.A.; Nabousli, D.; Kaddoum, G. Cost-efficient Federated Reinforcement Learning- Based Network Routing for Wireless Networks. In Proceedings of the 2022 IEEE Future Networks World Forum (FNWF), Montreal, QC, Canada, 10–14 October 2022; pp. 243–248. [Google Scholar] [CrossRef]
Abidin, N.Z.; Ritahani Ismail, A. Federated Deep Learning for Automated Detection of Diabetic Retinopathy. In Proceedings of the 2022 IEEE 8th International Conference on Computing, Engineering and Design (ICCED), Sukabumi, Indonesia, 28–29 July 2022; pp. 1–5. [Google Scholar] [CrossRef]
Ahmed, I.M.; Kashmoola, M.Y. Investigated Insider and Outsider Attacks on the Federated Learning Systems. In Proceedings of the 2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), Solo, Indonesia, 3–5 November 2022; pp. 438–443. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
Sabah, F.; Chen, Y.; Yang, Z.; Azam, M.; Ahmad, N.; Sarwar, R. Model optimization techniques in personalized federated learning: A survey. Expert Syst. Appl. 2024, 243, 122874. [Google Scholar] [CrossRef]
Long, G.; Xie, M.; Shen, T.; Zhou, T.; Wang, X.; Jiang, J. Multi-center federated learning: Clients clustering for better personalization. World Wide Web 2023, 26, 481–500. [Google Scholar] [CrossRef]
Zhou, H.; Lan, T.; Venkataramani, G.P.; Ding, W. Every parameter matters: Ensuring the convergence of federated learning with dynamic heterogeneous models reduction. Adv. Neural Inf. Process. Syst. 2024, 36, 25991–26002. [Google Scholar]
Haddadpour, F.; Mahdavi, M. On the convergence of local descent methods in federated learning. arXiv 2019, arXiv:1910.14425. [Google Scholar]
Fang, X.; Ye, M.; Yang, X. Robust heterogeneous federated learning under data corruption. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 5020–5030. [Google Scholar]
Sattler, F.; Müller, K.R.; Wiegand, T.; Samek, W. On the byzantine robustness of clustered federated learning. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8861–8865. [Google Scholar]
Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An efficient framework for clustered federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 19586–19597. [Google Scholar] [CrossRef]
Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated learning for mobile keyboard prediction. arXiv 2018, arXiv:1811.03604. [Google Scholar]
Konecnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Guo, J.; Liu, Z.; Tian, S.; Huang, F.; Li, J.; Li, X.; Igorevich, K.K.; Ma, J. TFL-DT: A trust evaluation scheme for federated learning in digital twin for mobile networks. IEEE J. Sel. Areas Commun. 2023, 41, 3548–3560. [Google Scholar] [CrossRef]
Diao, E.; Ding, J.; Tarokh, V. Heterofl: Computation and communication efficient federated learning for heterogeneous clients. arXiv 2020, arXiv:2010.01264. [Google Scholar]
Mei, Y.; Guo, P.; Zhou, M.; Patel, V. Resource-adaptive federated learning with all-in-one neural composition. Adv. Neural Inf. Process. Syst. 2022, 35, 4270–4284. [Google Scholar]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf (accessed on 1 February 2025).
Cohen, G.; Afshar, S.; Tapson, J.; Van Schaik, A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2921–2926. [Google Scholar]
Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
Zhao, B.; Sun, P.; Fang, L.; Wang, T.; Jiang, K. FedCom: A byzantine-robust local model aggregation rule using data commitment for federated learning. arXiv 2021, arXiv:2104.08020. [Google Scholar]

Figure 1. Overview of SC-Fed. By analyzing the client-side data and calculating its characteristics, we can cluster the data based on the similarity of these features. Subsequently, the server will allocate the frequency of model updates for each client, taking into account their communication and computational resources.

Figure 2. Comparison of model performance between the SC-Fed algorithm and baseline algorithms on three different datasets (EMNIST, FashionMNIST, CIFAR10).

Figure 3. Comparison of average waiting time between the SC-Fed algorithm and baseline algorithms on three different datasets (EMNIST, FashionMNIST, CIFAR10).

Table 1. Algorithm performance comparison. The maximum accuracy of the algorithms on datasets (CIFAR-10, FMNIST, EMNIST) with varying degrees (

α

= 0.1, 1, 10) of non-IID.

Table 1. Algorithm performance comparison. The maximum accuracy of the algorithms on datasets (CIFAR-10, FMNIST, EMNIST) with varying degrees (

α

= 0.1, 1, 10) of non-IID.

	CIFAR-10			FMNIST			EMNIST
Algorithm	$α = 0.1$	$α = 1$	$α = 10$	$α = 0.1$	$α = 1$	$α = 10$	$α = 0.1$	$α = 1$	$α = 10$
Solo	0.66	0.67	0.60	0.94	0.83	0.81	0.76	0.76	0.73
FedAvg [39]	0.81	0.76	0.78	0.87	0.80	0.84	0.80	0.80	0.77
Ditto [57]	0.83	0.80	0.78	0.98	0.95	0.86	0.88	0.82	0.79
FedCom [58]	0.77	0.78	0.80	0.88	0.92	0.85	0.80	0.80	0.78
SC-Fed (Ours)	0.83	0.81	0.79	0.93	0.91	0.83	0.81	0.81	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, Y.; Cui, F.; Che, J.; Ni, M.; Zhang, Z.; Li, J. Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering. Electronics 2025, 14, 745. https://doi.org/10.3390/electronics14040745

AMA Style

Zhou Y, Cui F, Che J, Ni M, Zhang Z, Li J. Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering. Electronics. 2025; 14(4):745. https://doi.org/10.3390/electronics14040745

Chicago/Turabian Style

Zhou, Ying, Fang Cui, Junlin Che, Mao Ni, Zhiyuan Zhang, and Jundi Li. 2025. "Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering" Electronics 14, no. 4: 745. https://doi.org/10.3390/electronics14040745

APA Style

Zhou, Y., Cui, F., Che, J., Ni, M., Zhang, Z., & Li, J. (2025). Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering. Electronics, 14(4), 745. https://doi.org/10.3390/electronics14040745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Elastic Balancing of Communication Efficiency and Performance in Federated Learning with Staged Clustering

Abstract

1. Introduction

2. Related Work

2.1. Personalized Federated Learning

2.2. Device Resources

3. Problem Statement

4. Overview and Implementation

4.1. Overview

4.2. Model Training Process

4.3. Implementation and Algorithm Description

4.4. Datasets and Settings

5. Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI