Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture

Zhang, Xubo; Luo, Yang

doi:10.3390/fi17060243

Open AccessArticle

Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture

by

Xubo Zhang

¹

and

Yang Luo

^2,*

¹

The 30th Research Institute of China Electronics Technology Group Corporation, Chengdu 610041, China

²

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(6), 243; https://doi.org/10.3390/fi17060243

Submission received: 11 April 2025 / Revised: 21 May 2025 / Accepted: 24 May 2025 / Published: 30 May 2025

(This article belongs to the Special Issue Edge Intelligence: Edge Computing for 5G and the Internet of Things, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

A federated learning (FL) framework for cloud–edge–client collaboration performs local aggregation of model parameters through edges, reducing communication overhead from clients to the cloud. This framework is particularly suitable for Internet of Things (IoT)-based secure computing scenarios that require extensive computation and frequent parameter updates, as it leverages the distributed nature of IoT devices to enhance data privacy and reduce latency. To address the issue of high-computation-capability clients waiting due to varying computing capabilities under heterogeneous device conditions, this paper proposes an improved resource allocation scheme based on a three-layer FL framework. This scheme optimizes the communication parameter volume from clients to the edge by implementing a method based on random dropout and parameter completion before and after communication, ensuring that local models can be transmitted to the edge simultaneously, regardless of different computation times. This scheme effectively resolves the problem of high-computation-capability clients experiencing long waiting times. Additionally, it optimizes the similarity pairing method, the Shapley Value (SV) aggregation strategy, and the client selection method to better accommodate heterogeneous computing capabilities found in IoT environments. Experiments demonstrate that this improved scheme is more suitable for heterogeneous IoT client scenarios, reducing system latency and energy consumption while enhancing model performance.

Keywords:

federated learning; heterogeneous clients; model dropout

1. Introduction

In wireless deployments of federated learning (FL), the network consists of numerous Internet of Things (IoT) devices, which exhibit significant differences in hardware performance, such as processor speed, memory capacity, and battery life [1]. These differences contribute to the heterogeneity of clients [2], a common characteristic of IoT environments. For example, smartphones typically have more storage space and greater processing capabilities compared to smartwatches and other IoT devices like sensors or wearables. In synchronous FL, IoT clients are responsible for updating local models and uploading these updates to a central server. However, when heterogeneous IoT clients attempt to synchronize model updates, devices with weaker computing capabilities require more time to complete local model updates, causing them to fall behind. This heterogeneity among IoT clients poses challenges to FL [3]: it may lead to inefficiencies in the system, as the varying computation speeds and resources of different clients can create bottlenecks in overall system performance; it also increases uncertainty and instability in the system, as the diverse states of IoT devices can lead to system failures, data loss, or communication disruptions. Therefore, to address the heterogeneity among different IoT clients in large-scale FL scenarios, aggregation strategies need to be adjusted [4].

To address the challenges caused by heterogeneous computing capabilities among clients, this paper proposes an improved three-tier FL algorithm based on enforced synchronization. Specifically, a model dropout–based enforced synchronization strategy is designed to ensure that heterogeneous clients transmit their model parameters to the edge servers simultaneously, thereby mitigating issues related to client-side waiting and model staleness. Building on this, we further optimize the similarity-based client pairing by incorporating computational heterogeneity, introducing a Shapley-value-based reward and penalty mechanism, and jointly considering both contribution and computing capability for client selection and bandwidth resource allocation. Experimental results demonstrate the effectiveness of the proposed approach, achieving superior performance over baseline methods in terms of latency, energy consumption, and final model accuracy.

The remainder of this paper is organized as follows: Section 2 reviews related works; Section 3 formulates the problem; Section 4, Section 5, Section 6 and Section 7 present the proposed methodology in detail; Section 8 provides simulation results; and Section 9 concludes the paper.

2. Related Works

In the context of three-tier FL architectures, researchers have primarily explored three categories of methods to address the inefficiency of training caused by resource heterogeneity across clients [5].

The first category involves introducing asynchronous or semi-asynchronous update mechanisms, whereby the cloud or edges are not required to wait for all slow clients to complete their local training before performing aggregation. This approach reduces latency associated with synchronization delays. For example, Wang et al. [6] proposed an asynchronous FL algorithm that allows clients to upload updates without global synchronization, significantly alleviating the bottleneck caused by straggling devices. However, asynchronous updates may introduce the problem of stale gradients, where model updates from slow clients are based on outdated global models, thus degrading convergence accuracy. To address this, some studies propose weighted asynchronous aggregation, dynamically adjusting each client’s contribution to the global model based on its computing capability, thereby mitigating the negative impact of delayed updates from low-resource clients [7,8].

The second category of methods aims to accommodate client heterogeneity by reducing the computational burden on clients, either by shrinking the model size or by updating only a portion of the model parameters. For instance, model dropout mechanisms have been employed to reduce the workload for weak clients. Dun et al. [9] proposed an asynchronous distributed dropout approach, in which only a subset of neurons participates in local updates at each client during training rounds. Specifically, the model is partitioned by layers, and different subsets of neurons in each layer are assigned to different clients. High-capability clients update the less frequently used (and more computationally intensive) parameters, while low-capability clients are responsible for updating the frequently used and critical parts, thereby accelerating convergence. Similarly, other studies allow clients to train sub-models of varying complexity (e.g., shallow or compact models), which are then fused using zero-padding or knowledge distillation during aggregation, balancing participation despite capability differences [3]. However, under the three-tier architecture, the number and computational distribution of clients served by different edge nodes may vary significantly, resulting in imbalanced update frequencies and quality at the edge.

The third category addresses the challenge from a system scheduling perspective by incorporating resource-aware coordination in the cloud–edge–client collaborative aggregation process. For example, some studies propose selecting clients for each training round based on indicators such as computing power, communication bandwidth, and data volume [10]. The earlier FedCS algorithm [11] follows this idea by filtering out clients that do not meet resource requirements, thereby excluding slow devices from aggregation to reduce per-round latency. However, naively eliminating weak clients may compromise model generalization. To overcome this, more refined scheduling approaches have emerged in recent years. Ko et al. [12] proposed a joint optimization of client selection and wireless bandwidth allocation by formulating a Markov decision process (MDP) model, which derives the optimal allocation strategy while ensuring data diversity across devices, thereby simultaneously reducing training latency and preserving model accuracy. Additionally, Chen et al. [13] investigated resource-aware strategies for optimizing both communication and computation in wireless IoT networks, dynamically adjusting the global aggregation frequency and local resource allocation based on network conditions to improve FL efficiency under heterogeneous settings. While these strategies alleviate inefficiencies caused by client heterogeneity at the system level, they often reduce or suppress the participation of low-performance clients [14].

In summary, to address the issue of uneven computing capabilities among clients in cloud–edge–client FL systems, existing studies have proposed various mechanisms including asynchronous updates, partial model updates, and resource-aware scheduling to improve aggregation efficiency and model performance. Nonetheless, challenges remain in ensuring the reliability of asynchronous convergence, balancing aggregation across edges, and effectively leveraging the contributions of weak clients.

3. Problem

In the training process of wireless FL, a significant portion of the system’s time consumption comes from communication overhead, while the local computation time of clients generally accounts for only about one-tenth of the communication time. Figure 1 shows the synchronous FL training process of three clients. Starting training at the same time, due to the considerable differences in computing capability among the clients, when using the same size data samples for training, clients with higher computing capability (such as Client 1) complete the training first, while clients with weaker computing capability (such as Clients 2 and 3) lag behind. Since each client occupies the same proportion of bandwidth exclusively, the communication time with the edge after completing local training is equal. In this case, high-computing-capability clients send their local models to the edge first, but the edge needs to wait to collect local models sent by all clients before aggregation. Hence, high-computing-capability clients (such as Clients 1 and 2) experience some waiting time. Although this does not impact the overall delay in a single round, if this phenomenon is scaled to all rounds globally and with a large number of clients sharing limited bandwidth, it inevitably results in a significant amount of time wasted.

4. Client–Edge Pairing Based on Similarity and Heterogeneous Computing Capabilities

A way to pair clients and edges based on model similarity is detailed in [15] without considering heterogeneity of clients. However, when there are significant differences in client computing capabilities, this method has drawbacks: although the paired clients under each edge maintain dissimilar data distributions, their computing capabilities may vary greatly. If a synchronous aggregation strategy is maintained, high-computing-capability clients wait for low-computing-capability clients, resulting in extremely long waiting times [16]. Therefore, this section proposes to consider heterogeneous computing capabilities in the pairing method, ensuring that the paired clients under each edge maintain dissimilar data distributions and similar computing capabilities. The specific improvement method is as follows.

First, we construct a similarity matrix S and group the similarities into

K / N

groups so that the data distribution characteristics of the clients within each group are approximately similar. Next, we introduce the consideration of heterogeneous computing capabilities: since there are significant differences in computing capabilities among clients within each group, we sort the computing capabilities within each group to obtain the sorted groups. Finally, when the cloud pairs each edge, if there are clients with the same computing capability in each group, they naturally cluster and form a pair for one of the edges. If there are no clients with exactly the same computing capability between groups or if partial matching has been completed, the remaining unselected clients are paired by the edge according to the sorted computing capability, from high to low, based on the sorted row numbers. The specific method is given in Algorithm 1. The paired clients under each edge maintain dissimilar data distributions but similar computing capabilities.

Algorithm 1: Client–edge pairing based on similarity and heterogeneous computing capabilities

The proposed client–edge pairing algorithm acts as a pre-processing module in the training process of a three-tier FL architecture. It performs client grouping based on local model similarity and heterogeneous computing capabilities. Since the algorithm does not directly participate in parameter optimization or model updates, it preserves the convergence properties of standard FL algorithms such as FedAvg or FedProx. By assigning clients with similar model structures to the same edge server, the algorithm reduces intra-group heterogeneity and enhances the quality of local aggregations, thereby improving the stability of global convergence. Moreover, the sorting of clients by computing capability ensures that low-resource devices are only assigned lightweight update tasks, enabling timely synchronization and reducing training latency caused by stragglers.

The computational complexity of Algorithm 1 mainly consists of three parts: singular value decomposition (SVD) of each client’s model, pairwise similarity computation, and client grouping with sorting. We let K denote the number of clients and d the dimensionality of the model parameters. Then, the overall time complexity is

O (K d^{3} + K^{2} d + K log K)

, where

O (K d^{3})

arises from SVD operations for K clients,

O (K^{2} d)

comes from computing pairwise similarities, and

O (K log K)

results from sorting clients based on their computing capabilities. In practice, the computational cost can be reduced by employing low-rank approximations or truncated SVD, which significantly speeds up the feature extraction step and enhances scalability for large-scale deployments.

5. Enforced Synchronization Strategy Based on Model Dropout

In the previous section, we re-paired the clients based on model similarity and computing capabilities. Although the computing capabilities of clients under the same edge are similar, they are not identical, resulting in some waiting time. This section proposes an enforced synchronization method based on model dropout for client transmissions. Clients with higher computing capability can transmit the entire model, while clients with lower computing capability perform dropout operations on the model using the algorithm in this section, ensuring that all clients’ local models reach the edge simultaneously, as shown in Figure 2.

As a regularization technique, dropout was first introduced in neural network methods to prevent overfitting by making some neurons’ activation values stop working with a certain probability during forward propagation. This prevents the model from relying too heavily on local features and thus improves model generalization [17]. In this section, a fixed proportion of neuron values are set to zero to reduce the pressure and delay of model transmission. We assume that all edges can beforehand know the computing capabilities of each client. Taking a specific edge as an example, when clients complete local updates, if all clients have equal computing capabilities, the entire model can be directly transmitted. If the computing capabilities are unequal, the client with the highest computing capability is used as the baseline. Since each client is allocated equal bandwidth, the time to transmit the entire model is equal, denoted as

T^{c m p}

. The dropout proportion

q_{i}^{M}

for each client’s model can be obtained using the following formula:

q_{i}^{M} = \frac{T_{i}^{com}}{T_{*}^{com}} = 1 - \frac{T_{i}^{cmp} - T_{*}^{cmp}}{T_{*}^{com}}

(1)

where

T_{*}^{c o m}

and

T_{*}^{c m p}

represent the time required for the client with the highest computing capability under that edge to perform local updates and model transmission, respectively. To ensure the effectiveness of this method, it is necessary to guarantee that

q_{i}^{M} \in (0, 1]

. Therefore, the number of local updates needs to be controlled within a reasonable range to avoid the situation where

T_{i}^{c m p} > T_{*}^{c o m} + T_{*}^{c m p}

.

When clients transmit model parameters to the edge, they follow the agreed-upon protocol using the dropout proportion

q_{i}^{M}

calculated by Equation (1) to apply dropout to the local model parameters before retransmitting. This selection of dropout parameters is referred to as the mask

M_{i}^{T, t}

. The generation of this mask can achieve an approximately pseudo-random effect based on some client identifier (such as version number or timestamp of the sent model). In the mask, the parameters to be sent are set to one and those not to be sent are set to zero, with the same dimensions as the model. This work employs lightweight identifiers—such as random seeds or hash signatures—instead of explicitly transmitting masks. Edge nodes can reconstruct the corresponding masks based on the client identifiers and use them to complete the missing portions of the uploaded model parameters. Bandwidth estimation indicates that the overhead introduced by this mechanism constitutes only a minimal fraction of the total uplink data volume, making its transmission latency negligible on typical IoT devices. The completed client model parameters

{\tilde{W}}_{l o c a l_{i}}^{T, t}

are

{\tilde{W}}_{{local}_{i}}^{T, t} \leftarrow W_{{local}_{i}}^{T, t} ⊙ M_{i}^{T, t} + ν \cdot W_{{edge}_{j}}^{T, t - 1} ⊙ {(M_{i}^{T, t})}^{c}

(2)

where

{(M_{i}^{T, t})}^{c}

denotes the complement of the mask; ⊙ denotes the Hadamard product between matrices; the unsent model parameters are replaced with the previous round’s edge model

W_{e d g e_{j}}^{T, t - 1}

. The proportion is controlled by the hyper-parameter

ν \in [0, 1]

. Once the edge reconstructs all clients’ local models, it can proceed with synchronous aggregation.

The proposed enforced synchronization strategy based on model dropout, as described in Algorithm 2, is designed to mitigate the impact of heterogeneous computing capabilities among terminal clients in a three-tier FL architecture. By selectively transmitting masked sub-models and enforcing synchronous local updates, the algorithm ensures that all clients—regardless of computing capacity—complete their updates within a fixed number of iterations, thereby aligning their upload timings to the edge servers. This structured synchronization, together with dropout-based model reconstruction at the edge level, enhances the consistency of aggregation and reduces the staleness of updates. Consequently, the method improves convergence stability and ensures the contribution of weak clients without degrading the global model quality. When built upon convergent federated optimization algorithms such as FedAvg, this strategy preserves theoretical convergence while empirically accelerating the training process by reducing idle waiting time and minimizing gradient variance during aggregation.

The computational complexity of Algorithm 2 can be broken down as follows. Each client performs

τ_{1}

local updates on a partial model of size

α d

, where d is the dimensionality of the global model and

α

in (0, 1] is the dropout ratio. The local update cost is thus

O (τ_{1} α d)

. Dropout masking and identifier encoding incur negligible overhead. Each edge server reconstructs

K / N

masked models per round, with a reconstruction cost of

O (α d)

per model. Aggregation and weighting involve an additional

O (K d)

across all edges. The total complexity per round is

O (K τ_{1} α d + K α d + N d)

, which is significantly lower than full-model synchronous updates. Moreover, the model compression via dropout reduces communication cost and training latency, making the algorithm well-suited for resource-constrained edge environments.

Algorithm 2: Enforced Synchronization Strategy Based on Model dropout [18]

6. Contribution-Based Aggregation Strategy Optimization Based on Heterogeneous Computing Capabilities

A high-contribution edge model essentially reflects the characteristics of the local dataset more effectively. In homogeneous client scenarios [19], since each client performs the same number of computations, the SV can directly reflect the importance of the dataset to the global model. However, when clients have different computing capabilities and can perform different numbers of local training iterations, directly using the SV as a measure of contribution is inappropriate, as it violates the principle of collaborative fairness in FL. Therefore, this section aims to design a contribution-based aggregation optimization strategy.

In the previous section, we grouped clients with similar computing capabilities into clusters, ensuring that the clients paired with each edge have similar computing capability. Through the dropout method, we also maintained consistent local iteration counts within each cluster, although the iteration counts differ between clusters. If we directly use the SV as the consideration for contribution weight, it may result in a loss of global model accuracy due to insufficient computing capability from originally high-contributing clients. Therefore, this section introduces a reward–punishment mechanism into the weights. Figure 3 presents a quadrant chart, where the horizontal axis represents client contribution and the vertical axis represents client computing resources. By calculating the contribution within each cluster (i.e., SV) and the average computing capability of each cluster, we can influence the weights. The specific explanation is as follows.

Clients within clusters with lower average computing capability have performed fewer local iterations compared to those with higher computing capability; thus, their local models should be less effective at feature extraction. Similarly, clients within clusters with higher average computing capability have performed more iterations and should have better feature extraction capabilities. Therefore, in the upper left and lower right quadrants of the quadrant chart, the SV can be directly used as a weight reference without any rewards or penalties. However, if a cluster with lower average computing capability has an SV higher than the average SV, it indicates that the local data within that cluster are of higher quality and better represents the global model. Thus, a reward factor

χ_{i}

(corresponding to the upper right quadrant in the chart) can be introduced to increase the proportion of this edge model in the subsequent weighting, thereby tilting the global model towards this edge model. Conversely, if a cluster with higher average computing capability has an SV lower than the average SV, it is the least desirable situation. It implies that the local data under high-computing-capability clients are of lower quality, offering limited features for the model to learn from. Thus, a penalty factor

ξ_{i}

(corresponding to the lower left quadrant in the chart) can be introduced to decrease the proportion of this edge model in the subsequent weighting, causing the global model to slightly diverge from this edge model.

Note that this method is limited to appropriate adjustments of SV. If the reward and penalty factors are too large, it may lead to overcorrection, making it difficult for the global model to converge. After analyzing the rewards and penalties for each edge model, we can start optimizing the aggregation weights with the following formula:

ω_{i}^{T} = \{\begin{matrix} \frac{χ_{i} ξ_{i} a s v_{i}}{\sum_{i = 0, a s v_{i} \geq 0}^{i = K} χ_{i} ξ_{i} a s v_{i}}, a s v_{i} \geq 0, \\ 0, a s v_{i} < 0 . \end{matrix}

(3)

where

a s v_{i}

is accumulative SV for client i. If no rewards or penalties are applied to the model, then

χ_{i}

and

ξ_{i}

are both 1.

7. IoT Wireless Resource Allocation Scheme

When clients are homogeneous, it is sufficient to reduce the participation of low-contribution clients in aggregation, thereby allocating more bandwidth resources to high-contribution clients. This allows for more local training to reduce the edge’s local loss function, thereby influencing the global average loss function. However, when clients are heterogeneous, it is necessary to consider that high-contribution clients may consume significant energy and take longer computation times during local calculations. This means they cannot perform multiple iterations within the same energy consumption or time frame. Therefore, this section comprehensively considers contribution and heterogeneous computing capability, selecting clients and allocating bandwidth resources based on the principle of fairness.

The main idea is to optimize client selection within a given time frame during the asynchronous aggregation phase. Specifically, this involves adjusting whether each client i participates in the asynchronous aggregation during the rth round of global aggregation, denoted as

α_{r, i}^{a}

. From a fairness perspective, the contribution of edges and clients (obtained by calculating the Shapley Value (SV) or model similarity) within the same number of training and aggregation cycles can directly reflect data quality [20]. According to the approach from the previous section, edges or clients with higher computing capability should inherently possess stronger data feature extraction capabilities, while those with lower computing capability should be given some tolerance in terms of model performance.

In the specific algorithm, this is reflected as follows: when it is necessary to exclude an edge with a low SV, the maximum computing resource

f_{j}^{m a x}

of the clients under that edge must be considered (since the number of local training iterations after dropout depends on the maximum computing capability under the edge). If

f_{j}^{m a x}

ranks low among all computing capabilitys, it is considered tolerable and is not excluded; however, if

f_{j}^{m a x}

ranks high, it is considered intolerable and can be excluded. Similarly, when using model similarity to assess client contribution and selecting the top clients, the computing capability

f_{i}

of clients under each edge needs to be sorted from high to low. If a client has low contribution and ranks low in computing capability, it is considered tolerable, as its data have not been reasonably utilized due to limited computing resources. By allocating more bandwidth resources and training opportunities to it, its model contribution can be improved. Conversely, if a client has low contribution but ranks high in computing capability, it indicates that its local data quality is low. Even with high computing capability, its contribution to the global model is minimal and it is considered intolerable, thus excluded from the client selection strategy. Algorithm 3 provides a detailed explanation using the rapid convergence phase as an example, with the same logic applying to the performance enhancement phase.

In each round, client computation (local updates) for selected participants is

O (K_{s} τ_{1} d)

, where

K_{s}

is the number of participating clients. Contribution ranking and participation decision per edge costs

O (K log K)

, assuming sorted ranking. Bandwidth reallocation and edge-level aggregation per edge is

O (K_{j} d)

, with

K_{j}

clients under edge j. Thus, the overall complexity per round is

O (K_{s} τ_{1} d + N K log K + N K_{j} d)

, which scales efficiently due to selective update and avoids full participation overhead. Additionally, the ranking mechanism filters out high-delay or low-impact clients, reducing unnecessary gradient computation and improving resource utilization.

Although Algorithm 3 involves several decision layers to jointly optimize fairness, efficiency, and resource utilization, its modular structure allows for adaptation in practice. For instance, the prioritization step could be replaced by a heuristic rule or sorted by lightweight metrics such as historical reliability or compute availability. While the current design aims to demonstrate the theoretical upper-bound of achievable performance, further simplification tailored to specific deployment contexts is a promising direction for future research.

Algorithm 3: IoT Wireless Resource Allocation Algorithm [21]

8. Simulations

8.1. Simulation Settings

The FL architecture consists of one cloud server, six edge servers, and thirty clients. The system latency primarily comprises three components: first, the computation delay incurred by local training on the client devices and the transmission delay associated with uploading local models; second, the latency at the edge servers, including the time required for edge aggregation, as well as the transmission delay for distributing the aggregated edge models to clients and uploading them to the cloud server; and finally, the latency at the cloud server for performing global aggregation and distributing the global model back to the edge servers. The specific calculation method follows the approach in [22], and the wireless network parameters used in our experiments are listed in Table 1.

The experiments are conducted on the MNIST [23], CIFAR10 [24], and FashionMNIST [25] datasets, with the target accuracy set to 90% for the MNIST and 80% for the other two datasets. Each client is allocated different computing capability, specifically CPU frequencies

f_{i}

, randomly assigned from the range

[5, 10, 15, 20] \times 10^{8}

. Due to heterogeneity of clients, both synchronous and asynchronous aggregation strategies need to be adjusted. The number of synchronous aggregations for each edge and client is fixed at 2, and the number of asynchronous aggregations at 8. The number of local training iterations for clients paired under each edge is no longer fixed but is determined by the client with the highest computing capability, calculated based on the fixed number of aggregations. To avoid introducing additional delays, the synchronous aggregation time is controlled to be less than the time taken by a client with computing capability

f = 1 \times 10^{9}

to perform 10 local training iterations as a baseline. The time for asynchronous aggregation should not exceed the communication time between the edge and the cloud. The reward and penalty factors for the aggregation weights,

χ_{i}

and

ξ_{i}

, are set to 1.01 and 0.99, respectively. The hyperparameter

ν

is set to 0.7 in Equation (2). The corresponding sensitivity analysis is provided in Appendix A.

The simulation tests the performance of different resource allocation schemes. The considered metrics are divided into energy consumption targets and accuracy targets: the energy consumption target is the system delay and total client energy consumption when the global model reaches the target accuracy, while the accuracy target is the highest accuracy the global model can achieve after 100 global rounds.

The comparison is primarily made with the following methods:

(1) Method 1: Full aggregation. All clients participate in the aggregation, corresponding to the experimental results from the previous section.

(2) Method 2: Random selection for aggregation. All clients randomly participate in the aggregation according to a fixed proportion frac.

(3) Method 3: Selection of only high-computing capability clients. Clients are selected for aggregation based on their computing capability, according to a fixed proportion frac.

(4) Method 4: Selection of only clients with high local loss [26]. Clients with higher local loss are preferentially selected for aggregation according to the fixed proportion frac.

(5) Method 5: Selection of only high-contribution clients [27].

(6) Method 6: Selection of clients based on a comprehensive evaluation of location, historical information, data characteristics, and current status [22].

The proportion frac is uniformly set to 0.75.

8.2. Simulation Results

Figure 4 presents the impact curves of different resource allocation methods on the global model across three datasets in a heterogeneous client scenario. Among these methods, Method 3 performs the worst. This method fixes the selected clients during the asynchronous aggregation phase, preventing other clients from participating. As a result, only updates from high-computing-capability clients are received during each aggregation, which harms collaboration fairness and hinders global model updates. Method 2, which employs random selection for aggregation, performs moderately as it can gather information from different clients during the asynchronous aggregation phase. Method 1, which uses a full aggregation strategy, shows better results. Although it gathers more client information, it fails to distinguish the contribution levels of each client, and the number of training iterations per client is lower compared to strategies that involve client selection. Consequently, while Method 1 leads in accuracy, the total energy consumption of the clients remains high. Method 4 measures contribution based on each client’s loss value. This method performs well in later stages, achieving higher global model accuracy. Method 5, designed for client selection in homogeneous client scenarios, saves bandwidth resources, allowing some clients more training opportunities. It also achieves high global model accuracy in later stages. However, since it does not account for heterogeneous computing capability and only relies on SV and model similarity for selection, it loses some truly high-quality clients, leading to delays and higher energy consumption in the early stages, failing to outperform other algorithms. The algorithm proposed in this paper combines the advantages of the above baselines, achieving convergence to a high-performance global model while also ensuring low delay and energy consumption for short-term goals. We also compared it with another federated learning strategy, namely Method 6. Although this approach adopts a multi-criteria client selection scheme, its bandwidth allocation is static, the aggregation strategy relies on the standard FedAvg algorithm, and the fault-tolerance mechanism is based on a fixed participation threshold. As a result, its performance is inferior to the method proposed in this paper. Specific experimental data can be found in Table 2.

The IoT wireless resource allocation scheme proposed in this paper comprehensively considers the heterogeneous computing capabilities of clients and the contribution of their models, avoiding the neglect of truly high-contribution clients in simple selection methods. From the perspective of energy consumption, the system delay and energy consumption required to achieve the specified accuracy are significantly reduced, with up to 70% of the time saved in reaching the target and up to 60% of client energy consumption conserved. In terms of accuracy, under the same 100 rounds of global communication, this algorithm can further improve the maximum accuracy of the global model by approximately 0.06% to 0.39% compared to not performing client selection (i.e., Method 1).

We continue to conduct ablation experiments on two aspects of the proposed scheme: pairing optimization and contribution weight reward–penalty factors. The experiments focus on the performance of the global model in FL, specifically the maximum accuracy within 100 communication rounds and the number of global rounds required to reach the specified accuracy. First, we validate the experimental effect of pairing optimization by comparing it with the following edge-client pairing methods on the MNIST dataset, ensuring that all other conditions remain the same:

(1) Random pairing;

(2) Pairing based on similarity only;

(3) Pairing based on computing capability only, where all clients are sorted by computing capability from highest to lowest, and each edge only considers clients with the same or similar computing capabilities.

Figure 5 shows the impact of different pairing methods on global accuracy. It can be observed that although the differences among the four methods are not significant, there are still subtle variations: pairing based solely on computing capability and random pairing yield the worst results, further confirming the importance of similarity-based pairing within this framework. It also indicates that there is no need to pair clients based solely on computing capability similarity, as the dropout method can approximate this effect. Pairing based solely on similarity can result in significant differences in computing capability within each edge, leading to more model parameter loss when using dropout for enforced synchronization, making it unsuitable for heterogeneous client scenarios. The proposed method outperforms the other three pairing methods.

Finally, an ablation experiment is conducted on the reward–penalty factor strategy, primarily comparing the scenarios of not using reward–penalty factors and using the reward–penalty factors in reverse (i.e., swapping

χ_{i}

and

ξ_{i}

). The results on the MNIST dataset are shown in Figure 6. It can be observed that the overall effect of using the reward–penalty factors is the best, while the methods of not using them and using them in reverse show slightly worse performance. The reason the method without reward–penalty factors performs slightly worse is that it fails to fully integrate heterogeneous computing capabilities into the contribution evaluation. Without this consideration, the performance is diminished in scenarios with heterogeneous clients.

9. Conclusions

This paper studies communication resource allocation in cloud–edge–client FL under heterogeneous IoT client computing capabilities. We designed an enforced synchronization update strategy based on model dropout, allowing each client to simultaneously transmit model parameters to the edge during each aggregation, combining the advantages of both synchronous and asynchronous methods. We incorporated heterogeneous computing capabilities into the pairing consideration, ensuring that the data samples covered by each edge are dissimilar while maintaining similar computing capability. Additionally, we designed reward–penalty factors to make the proposed method more targeted for scenarios with heterogeneous computing capabilities. Furthermore, the factor of heterogeneous computing capability was also integrated into the client selection and bandwidth allocation algorithms, which, compared to strategies that consider only contribution or computing capability, can further reduce IoT system delay and energy consumption while improving the performance of the final model.

Author Contributions

Methodology, Y.L.; Validation, X.Z.; Writing—original draft, X.Z.; Writing—review & editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Sensitivity Analysis of Parameter Settings

To verify the rationality of the settings for the aggregation reward and penalty factors

χ_{i}

and

ξ_{i}

as well as the hyperparameter

ν

, we conduct a sensitivity analysis. The experimental configuration follows the settings described in Section 8. In this analysis, the accuracy is measured as the average accuracy across the three datasets, and the convergence rounds refer to the average number of global rounds required to reach the target accuracy defined in Section 8.

As shown in Figure A1, for the aggregation reward and penalty factors

χ_{i}

and

ξ_{i}

, the highest average accuracy of 0.918 is achieved when

χ_{i} = 1.01

and

ξ_{i} = 0.99

.

Figure A1. Sensitivity analysis of reward and penalty factors

χ

and

ξ

.

Figure A1. Sensitivity analysis of reward and penalty factors

χ

and

ξ

.

As shown in Figure A2, for hyperparameter

ν

in Equation (2), the setting

ν = 0.7

yields the highest average accuracy and requires the fewest training rounds to reach the target accuracy.

Figure A2. Sensitivity analysis of hyperparameter

ν

.

Figure A2. Sensitivity analysis of hyperparameter

ν

.

References

Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Vincent Poor, H. Federated Learning for Internet of Things: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Fu, L.; Zhang, H.; Gao, G.; Zhang, M.; Liu, X. Client Selection in Federated Learning: Principles, Challenges, and Opportunities. IEEE Internet Things J. 2023, 10, 21811–21819. [Google Scholar] [CrossRef]
Shen, J.; Cheng, N.; Wang, X.; Lyu, F.; Xu, W.; Liu, Z.; Aldubaikhy, K.; Shen, X. RingSFL: An Adaptive Split Federated Learning Towards Taming Client Heterogeneity. IEEE Trans. Mob. Comput. 2024, 23, 5462–5478. [Google Scholar] [CrossRef]
Pene, P.; Liao, W.; Yu, W. Incentive Design for Heterogeneous Client Selection: A Robust Federated Learning Approach. IEEE Internet Things J. 2024, 11, 5939–5950. [Google Scholar] [CrossRef]
Martínez Beltrán, E.T.; Pérez, M.Q.; Sánchez, P.M.S.; Bernal, S.L.; Bovet, G.; Pérez, M.G.; Pérez, G.M.; Celdrán, A.H. Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges. IEEE Commun. Surv. Tutor. 2023, 25, 2983–3013. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, Z.; Tian, Y.; Yang, Q.; Shan, H.; Wang, W.; Quek, T.Q.S. Asynchronous Federated Learning over Wireless Communication Networks. IEEE Trans. Wirel. Commun. 2022, 21, 6961–6978. [Google Scholar] [CrossRef]
Zhu, H.; Zhou, Y.; Qian, H.; Shi, Y.; Chen, X.; Yang, Y. Online Client Selection for Asynchronous Federated Learning with Fairness Consideration. IEEE Trans. Wirel. Commun. 2023, 22, 2493–2506. [Google Scholar] [CrossRef]
Wu, W.; He, L.; Lin, W.; Mao, R.; Maple, C.; Jarvis, S. SAFA: A Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead. IEEE Trans. Comput. 2021, 70, 655–668. [Google Scholar] [CrossRef]
Dun, C.; Hipolito, M.; Jermaine, C.; Dimitriadis, D.; Kyrillidis, A. Efficient and Light-Weight Federated Learning via Asynchronous Distributed Dropout. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 6630–6660. [Google Scholar]
Yang, Z.; Chen, M.; Saad, W.; Hong, C.S.; Shikh-Bahaei, M. Energy Efficient Federated Learning Over Wireless Communication Networks. IEEE Trans. Wirel. Commun. 2021, 20, 1935–1949. [Google Scholar] [CrossRef]
Nishio, T.; Yonetani, R. Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–7. [Google Scholar] [CrossRef]
Ko, H.; Lee, J.; Seo, S.; Pack, S.; Leung, V.C.M. Joint Client Selection and Bandwidth Allocation Algorithm for Federated Learning. IEEE Trans. Mob. Comput. 2023, 22, 3380–3390. [Google Scholar] [CrossRef]
Chen, H.; Huang, S.; Zhang, D.; Xiao, M.; Skoglund, M.; Poor, H.V. Federated Learning over Wireless IoT Networks with Optimized Communication and Resources. IEEE Internet Things J. 2022, 9, 16592–16605. [Google Scholar] [CrossRef]
Nguyen, V.-D.; Sharma, S.K.; Vu, T.X.; Chatzinotas, S.; Ottersten, B. Efficient Federated Learning Algorithm for Resource Allocation in Wireless IoT Networks. IEEE Internet Things J. 2021, 8, 3394–3409. [Google Scholar] [CrossRef]
Shen, J.; Wang, X.; Cheng, N.; Ma, L.; Zhou, C.; Zhang, Y. Effectively Heterogeneous Federated Learning: A Pairing and Split Learning Based Approach. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 5847–5852. [Google Scholar] [CrossRef]
Luo, L.; Zhang, C.; Yu, H.; Sun, G.; Luo, S.; Dustdar, S. Communication-Efficient Federated Learning with Adaptive Aggregation for Heterogeneous Client-Edge-Cloud Network. IEEE Trans. Serv. Comput. 2024, 17, 3241–3255. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Lin, W.; Xu, Y.; Liu, B.; Li, D.; Huang, T.; Shi, F. Contribution-based Federated Learning Client Selection. Int. J. Intell. Syst. 2022, 37, 7235–7260. [Google Scholar] [CrossRef]
Fang, X.; Ye, M. Robust Federated Learning with Noisy and Heterogeneous Clients. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10062–10071. [Google Scholar] [CrossRef]
Zhu, H.; Li, Z.; Zhong, D.; Li, C.; Yuan, Y. Shapley-Value-Based Contribution Evaluation in Federated Learning: A Survey. In Proceedings of the 2023 IEEE 3rd International Conference on Digital Twins and Parallel Intelligence (DTPI), Orlando, FL, USA, 7–9 November 2023; pp. 1–5. [Google Scholar] [CrossRef]
El-Niss, A.; Alzu’Bi, A.; Abuarqoub, A.; Hammoudeh, M.; Muthanna, A. SimProx: A Similarity-Based Aggregation in Federated Learning With Client Weight Optimization. IEEE Open J. Commun. Soc. 2024, 5, 7806–7817. [Google Scholar] [CrossRef]
Zhang, Z.; Gao, Z.; Guo, Y.; Gong, Y. Scalable and Low-Latency Federated Learning With Cooperative Mobile Edge Networking. IEEE Trans. Mob. Comput. 2024, 23, 812–822. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. CIFAR-10 Dataset. Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 20 May 2025).
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
Wu, H.; Wang, P. Node Selection Toward Faster Convergence for Federated Learning on Non-IID Data. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3099–3111. [Google Scholar] [CrossRef]
Hu, F.; Zhou, W.; Liao, K.; Li, H. Contribution- and Participation-Based Federated Learning on Non-IID Data. IEEE Intell. Syst. 2022, 37, 35–43. [Google Scholar] [CrossRef]

Figure 1. Synchronous FL training process.

Figure 2. Enforced synchronization strategy based on model dropout.

Figure 3. Contribution-based aggregation strategy optimization.

Figure 4. The accuracy curves of the global model across different training rounds under various resource allocation schemes in a heterogeneous client scenario based on the following datasets: (a) MNIST; (b) CIFAR10; (c) FashionMNIST.

Figure 5. Comparision of the impact of different pairing methods on the global model accuracy using the MNIST dataset.

Figure 6. Comparison of the impact of weight reward–penalty factors on the global model accuracy using the MNIST dataset.

Table 1. Wireless Network Scenario Parameter Settings.

Parameter	Value
Computation cycles per bit $c_{i}$	30 cycles/bit
Capacitance coefficient $a_{i} / 2$	$2 \times 10^{- 26}$
Bandwidth B	$1 \times 10^{6}$ Hz
Client transmit power $p_{i}$	0.5 W
Channel gain $h_{i}$	$1 \times 10^{- 8}$
Power spectral density of noise $N_{0}$	$1 \times 10^{- 10}$ W

Table 2. System delay and total client energy consumption to achieve the target accuracy under different IoT resource allocation schemes. The “-” indicates that the target accuracy was not reached. TD is time delay (ms), EC is energy consumption (mj). The highest accuracy refers to the maximum accuracy the global model can achieve after 100 rounds of global communication.

MNIST
	Target Accuracy—95%		Target Accuracy—96%		Target Accuracy—97%		Target Accuracy—98%		Highest Accuracy
	TD	EC	TD	EC	TD	EC	TD	EC	Highest Accuracy
Method 1	164	1343	211	1819	444	3462	748	6435	0.9859
Method 2	182	1239	276	1962	453	3099	981	6954	0.9828
Method 3	296	1632	371	2138	569	3423	-	-	0.9808
Method 4	255	1514	291	1738	509	2991	1034	6279	0.9827
Method 5	212	1427	315	2182	518	3549	979	7028	0.9839
Method 6	177	1122	256	1629	493	3418	831	6315	0.9857
Proposed	139	1046	174	1311	444	3307	627	5085	0.9865
CIFAR10
	Target Accuracy—80%		Target Accuracy—85%		Target Accuracy—87%		Target Accuracy—88%		Highest Accuracy
	TD	EC	TD	EC	TD	EC	TD	EC	Highest Accuracy
Method 1	33,003	302,417	77,411	720,087	132,883	1,233,245	179,835	1,668,598	0.8808
Method 2	56,067	425,742	101,316	771,132	178,139	1,363,374	-	-	0.8707
Method 3	50,954	340,572	96,168	660,553	-	-	-	-	0.8635
Method 4	50,973	340,210	134,439	787,462	-	-	-	-	0.8613
Method 5	42,411	286,778	101,314	674,537	159,341	1,078,677	-	-	0.8710
Method 6	38,169	271,664	95,652	643,017	1,612,855	1,065,241	153,328	1,459,732	0.8804
Proposed	27,911	252,493	66,315	601,482	108,145	972,479	131,176	1,123,961	0.8843
FashionMNIST
	Target Accuracy—80%		Target Accuracy—85%		Target Accuracy—87%		Target Accuracy—88%		Highest Accuracy
	TD	EC	TD	EC	TD	EC	TD	EC	Highest Accuracy
Method 1	2769	25,256	6001	54,454	10,464	96,068	14,934	137,331	0.8865
Method 2	3486	25,879	6626	49,244	11,516	86,231	17,402	130,443	0.8811
Method 3	4628	30,206	14,933	97,586	-	-	-	-	0.8643
Method 4	2733	16,555	8287	53,243	10,661	70,713	17,425	121,775	0.8866
Method 5	3622	22,117	8082	52,604	12,336	87,948	21,874	123,523	0.8805
Method 6	3018	20,414	6910	44,146	10,115	80,912	19,989	122,671	0.8857
Proposed	2665	18,220	4763	33,071	8295	58,221	17,431	120,782	0.8882

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Luo, Y. Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture. Future Internet 2025, 17, 243. https://doi.org/10.3390/fi17060243

AMA Style

Zhang X, Luo Y. Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture. Future Internet. 2025; 17(6):243. https://doi.org/10.3390/fi17060243

Chicago/Turabian Style

Zhang, Xubo, and Yang Luo. 2025. "Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture" Future Internet 17, no. 6: 243. https://doi.org/10.3390/fi17060243

APA Style

Zhang, X., & Luo, Y. (2025). Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture. Future Internet, 17(6), 243. https://doi.org/10.3390/fi17060243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resource Allocation for Federated Learning with Heterogeneous Computing Capability in Cloud–Edge–Client IoT Architecture

Abstract

1. Introduction

2. Related Works

3. Problem

4. Client–Edge Pairing Based on Similarity and Heterogeneous Computing Capabilities

5. Enforced Synchronization Strategy Based on Model Dropout

6. Contribution-Based Aggregation Strategy Optimization Based on Heterogeneous Computing Capabilities

7. IoT Wireless Resource Allocation Scheme

8. Simulations

8.1. Simulation Settings

8.2. Simulation Results

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Sensitivity Analysis of Parameter Settings

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI