Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks

Zhuang, Xiaochong; Luo, Chuanbai; Xie, Zhenghao; Li, Yu; Jiang, Li

doi:10.3390/electronics14040663

Open AccessArticle

Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks

by

Xiaochong Zhuang

¹

,

Chuanbai Luo

²,

Zhenghao Xie

²,

Yu Li

³ and

Li Jiang

^4,*

¹

Guangdong–Hong Kong–Macao Joint Laboratory for Smart Discrete Manufacturing, The Key Laboratory of Intelligent Detection and Internet of Manufacturing Things, Ministry of Education, Guangdong University of Technology, Guangzhou 510006, China

²

Guangdong Provincial Key Laboratory of Intelligent Systems and Optimization Integration, Guangdong University of Technology, Guangzhou 510006, China

³

Chongqing Key Laboratory of Intelligent Perception and BlockChain Technology, Chongqing Technology and Business University, Chongqing 400067, China

⁴

111 Center for Intelligent Batch Manufacturing Based on IoT Technology, The Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(4), 663; https://doi.org/10.3390/electronics14040663

Submission received: 9 January 2025 / Revised: 1 February 2025 / Accepted: 6 February 2025 / Published: 8 February 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

With the rapid development of Wireless Computing Power Networks (WCPNs), the urgent need for data privacy protection and communication efficiency has led to the emergence of the federated learning (FL) framework. However, the time delay leads to dragging problems and reduces the convergence performance of FL in the training process. In this article, we propose an FL resource scheduling strategy based on information age perception in WCPNs, which can effectively reduce the time delay and enhance the convergence performance of FL. Moreover, a data cache buffer and a model cache buffer are set up at the user end and the central server, respectively. Next, we formulate the parametric age-aware problem to simultaneously minimize the global parameter age, energy consumption, and FL service delays. Considering the dynamic WCPN environment, the optimization target is modeled as a Markov decision process (MDP), and the Proximal Policy Optimization (PPO) algorithm is used to achieve the optimal solution. Numerical simulation results demonstrate that the proposed method significantly outperforms baseline schemes across critical metrics. Specifically, the proposed approach reduces FL service delays by 25.2%. It also decreases the global parameter age by 45.5% through the joint optimization of the data collection frequency, computation frequency, and bandwidth allocation. The method attains a reward value of 65 at convergence, 18.2% higher than the WithoutAnyCache scheme and 8.3% higher than the OnlyLocalCache scheme. FL accuracy improves to 98.2% with a 0.08 final loss. Finally, numerical simulation results further confirm the superiority and outstanding performance of the proposed method.

Keywords:

federated learning (FL); wireless computing power network (WCPN); cache mechanism; parametric age; resource scheduling

1. Introduction

With the rapid development of science and technology, wireless services, such as cloud-based extended reality, holographic communication, and intelligent interaction, are becoming more prevalent. To provide users with a real-time immersive experience, the Wireless Computing Power Network (WCPN), a highly integrated intelligent framework, is capable of delivering intense and adaptive computing services to consumers through the deep integration and utilization of distributed reachable computing resources [1,2,3]. Moreover, WCPNs can effectively cope with the challenge of load imbalance by decomposing computing tasks and coordinating between computing nodes [2,3], fully utilizing available computing resources, and providing fast and accurate computing services. However, WCPNs largely rely on data, and the correctness of their decisions depends on the diversity of the data used for training. Therefore, relying only on local data from devices for training is insufficient. Furthermore, it is necessary to combine data from other devices to provide a sufficient amount and diversity of data. However, local data on devices constitute users’ private information, and sharing such data may lead to malicious misuse, thus deviating from the original intention of improving the user experience. Recognizing these potential risks of privacy leakage, a distributed machine learning model, federated learning (FL), has emerged [4,5,6,7]. Unlike traditional machine learning methods that run in data centers, FL only needs to send machine learning models to various intelligent devices for training. After training, smart devices upload trained models to central nodes, which then aggregate the models uploaded by various smart devices. The aggregated model is a comprehensive model that combines data from all smart devices. There is no interaction involving the original data on devices during this process, only the uploading of device machine learning models, ensuring user data privacy while enabling collaborative machine learning across multiple devices. Therefore, FL has been applied to enable the distributed intelligence of WCPNs, guaranteeing the security of computing nodes and the privacy of computing devices [8].

In FL, the central node needs to connect with a large number of user devices and receive the distributed machine learning models in each global aggregation [5,9]. With the rapid development of WCPNs, the number of intelligent devices is already overwhelming and continues to increase. The performance requirements of these devices for machine learning will certainly widen the coverage of FL and enlarge the scale of machine learning models, thereby exerting tremendous pressure on the computing architecture. However, due to the limited spectrum resources, there are always some user devices that do not have enough spectrum resources to transmit machine learning models, resulting in slow or even failed transmission tasks. These users are called draggers or laggards, and their transmission latency has a decisive impact on the efficiency of FL. Therefore, dragger problems are indispensable in the research of FL [6,7,10]. To improve the efficiency of FL, some studies based on different design principles have proposed various device scheduling strategies, including reducing draggers and minimizing convergence time [11,12]. In addition, due to the heterogeneity of smart devices’ computing capabilities, communication conditions, and imbalanced service demands, it is usually possible for a single node to provide FL services, which may lead to device overloading and communication congestion [13,14]. To address this massive computational demand, W. Sun et al. incorporated an asynchronous FL WCPN framework, jointly optimizing the computation strategy of individual computing nodes and their collaborative learning strategies to minimize the total energy consumption of all computing nodes [15]. This algorithm performs outstandingly in terms of learning accuracy, convergence speed, and energy conservation.

The architecture of WCPNs enables users to receive dense and adaptive computing tasks without knowing the exact service deployment; they only need to provide their own service demands. The service delay caused by insufficient computing power, poor communication conditions, or other factors is extremely lethal for time-sensitive tasks, especially for those that require real-time monitoring and control. The interval between data generation and invocation is a key factor affecting task performance, which is referred to as the age of information (AoI) [16]. The AoI has become an important metric for measuring data freshness. There have been several studies on the impact of the AoI on different optimization objectives that have sought to enhance data utilization efficiency and reliability by minimizing the AoI defined in this specific scenario [17,18,19,20,21,22,23,24]. To minimize the average age of critical information (AoCI) of mobile clients, X. Wang et al. built a system model based on request-response communication [25]. They proposed an information sensing heuristic algorithm based on Dynamic Programming (DP), which showed advantages in terms of the average AoCI under various network parameters and had a short convergence time. Furthermore, W. Dai et al. studied the convergence behavior of extensive distributed machine learning models and algorithms with delayed updates [26]. Through numerous experiments, it was demonstrated that delayed updates significantly reduce the convergence speed of FL global models due to the influence of gradient coherence. In other words, the staleness of each local model in FL affects the convergence speed of the global model, and outdated local models may negatively impact global model convergence when they are updated in opposite directions from the current global model.

Caching originates from computer systems, where copies of frequently used commands and data are stored in memory [27]. To enhance the communication efficiency of FL, several researchers have proposed caching schemes to improve FL, effectively reducing the training time of FL by minimizing the per-iteration training time [28,29,30]. However, existing research on FL with caching lacks an in-depth exploration of the AoI. Furthermore, although an excellent AoI can significantly improve the performance of time-sensitive machine learning models, the cost of achieving a low AoI in the FL framework often leads to significant service delays. Therefore, to improve the quality of models in WCPNs, this paper proposes an FL strategy with caching-based resource scheduling and focuses on optimizing resource coordination, thereby improving service quality and reducing the parameter age related to the model.

1.1. Related Works

In many time-sensitive FL tasks, both the quality of the data used for local training and that of the uploaded models are closely related to their age. Outdated parameters can result in slower convergence rates in FL and a reduction in the performance of the global model. To tackle these issues, several recent studies have focused on optimizing the data parameter age during the device scheduling process.

In [31], the parameter age was incorporated into the device scheduling process in FL, and an age-aware wireless network FL communication strategy was proposed to prevent the devices from exiting training by using energy collection technology. Fast and accurate model training can be achieved by considering both the aging of parameters and the heterogeneous capabilities of terminal devices. In [32], the authors proposed a theoretical framework for statistical AoI provisioning in mobile edge computing (MEC) systems based on Stochastic Network Calculus (SNC) to support the tail distribution analysis of the AoI. Additionally, they designed a dynamic joint optimization algorithm based on block coordinate descent to solve the energy minimization problem. In [33], the authors focused on the timeliness of MEC systems, emphasizing the importance of data and computing task freshness, and establishing an age-sensitive MEC model in which the AoI minimum problem is defined. A new hybrid policy-based multimodal deep reinforcement learning (RL) framework with edge FL mode was proposed, which not only outperformed the baseline system in terms of average system age but also improved the stability of the training process. In [34], considering the time-sensitive IoT system with UAV assistance, a study on information freshness in the system was conducted, and a node data acquisition algorithm based on DDQN was proposed to effectively reduce the average information age of the system by optimizing the flight trajectories and transmission sequences of sensors. In [35], we proposed FedAoI, an AoI-based client selection policy. FedAoI ensures fairness by allowing all clients, including stragglers, to submit their model updates while maintaining high training efficiency by keeping round completion times short. In [36], the authors formulated the problem of multi-UAV trajectory planning and data collection as a mixed-integer nonlinear programming (MINLP) problem, aiming to minimize the AoI and energy consumption of IoT devices. The formulation takes into account the flight constraints of UAVs, the limitations of device data collection, and the conditions for interference coordination. In [37], considering the trade-off between data sampling frequency and long-term data transmission energy consumption while maintaining information freshness, the authors proposed a two-stage iterative optimization framework based on FL to minimize the global mean and variance of transmission energy, significantly reducing execution energy consumption while incurring only a relatively small performance loss.

In addition to considering the impact of the parameter age in federated machine learning, these works have yielded good optimization results in terms of service delays or energy consumption. However, they have not conducted in-depth research on time-sensitive data and have not fully considered the potential reference value of lagging user models, focusing only on resource scheduling optimization strategies to shorten service delays. As clearly shown in Table 1, our work has significant advantages in multi-objective optimization and methodological innovations. Unlike other studies, our work does not focus on a single metric but comprehensively considers the global parameter age, energy consumption, and federated learning service delays, achieving multi-objective optimization. In terms of methods, the use of the double-cache mechanism and the Proximal Policy Optimization algorithm not only takes full advantage of local data and the central server model but also improves system performance through effective algorithm optimization. In terms of experimental results, our work has yielded better results in reducing service delays, minimizing parameter age, and lowering energy consumption, clearly demonstrating the effectiveness and superiority of these innovations in practical applications. Based on these considerations, this paper designs a WCPN FL framework with dual caches that comprehensively considers parameter age, energy consumption, and service delays while optimizing the iteration strategy. Additionally, a WCPN FL resource scheduling strategy based on information age perception is proposed, which incorporates a dynamic scheduling strategy to enable as many users as possible to participate in FL in a timely manner by jointly considering parameter freshness and terminal device heterogeneity.

1.2. Contributions and Paper Organization

The main contributions of this work can be summarized as follows:

To address the trade-off between parameter age and service delays, we propose an FL resource scheduling strategy based on information age perception in WCPNs, which takes advantage of data caching mechanisms. This strategy optimizes the device’s data collection frequency, computation frequency, and spectrum resource allocation to improve global model performance in FL.
We comprehensively consider the aging of parameters, time-varying channels, random FL request arrivals, and heterogeneous computing capabilities among participating devices. We model the high-dimensional, dynamic, multi-user centralized FL framework system as an MDP and employ the Proximal Policy Optimization (PPO) algorithm to minimize global parameter age, energy consumption, and FL service delays.
Extensive comparative simulation experiments are conducted to verify the correctness and superiority of our scheme, with an in-depth analysis of the results.

The remainder of this article is organized as follows. Section 2 provides a list of the notations used in this paper. Section 3 proposes a dual-cache FL framework with local data caching and server model caching and presents the problem of the WCPN FL resource scheduling strategy based on information age perception. Section 4 formulates the MDP for the target problem and constructs an age-aware PPO double-cache FL scheduling algorithm to obtain the corresponding optimal solution. Section 5 demonstrates the performance of our proposed approach through extensive simulation studies, providing simulation results and theoretical analysis. Finally, we summarize this paper in Section 6.

2. Notations

This section provides a list of the notations used in this paper for easy reference, as shown in Table 2.

3. System Model and Problem Formulation

In related research on conventional FL, it is usually assumed that intelligent terminals collect data in advance for model training. However, this can result in a certain degree of data aging. A higher AoI is particularly problematic for tasks that are sensitive to it. In addition, collecting data only when receiving FL requests may ensure sufficient freshness, but the service delays during data collection may be unbearable. Therefore, it is necessary to balance service delays and information age while coordinating their requirements effectively.

Our system model, as shown in Figure 1, consists of a central server and K smart devices that have data collection and local computing capabilities. These devices share limited spectrum resources to communicate with the central server for local model uploads. The model owner initiates an FL task involving a set of smart devices

K = {1, \dots, k, \dots, K}

. During the FL task, there can be multiple instances of model training requests initiated by the model owner, and each participating intelligent device uses its latest local data to collaboratively train the model and maintain the global model. This paper assumes that each arriving request follows a Poisson process [38]. First, FL-based model training is initiated upon request from the model owner, with each training session spanning N iterations to minimize the global loss

F (w^{N})

, where N is specified by the model owner and

n = {1, \dots, n, \dots, N}

. Each iteration is composed of three steps in sequence:

Step 1 : Local Training .

The smart device k uses the locally collected and processed data to train the received global model

w^{n}

to obtain the local model

w_{k}^{n}

.

Step 2 : Model Uploading .

The smart device transmits the model parameters

w_{k}^{n}

to the central server.

Step 3 : Global Model Update .

All model parameters received from the K smart devices are aggregated to obtain a new global model

w^{n + 1}

, which is then sent to each intelligent device for the

n + 1

iteration.

In the synchronous FL scheme, the time taken for each iteration is limited by the slowest device. The central server needs to wait for all smart devices to complete local training and then aggregate it into the global model. Smart devices also need to obtain the new global model to carry out the next round of iterations. To effectively reduce the service delays in FL (i.e., the delays from responding to FL to completing model uploading), this paper designs an age-aware dual-cache FL framework. Each smart device has a data cache buffer to pre-collect local training data. Meanwhile, the central server sets up a cache buffer to save models uploaded by lagging devices.

3.1. AoI and Service Latency Model

Since the diversity of uploaded parameters is crucial for training performance with non-independently distributed data, we utilize a bandwidth dynamic scheduling strategy to enable as many users as possible to participate in FL in a timely manner. In this paper, we propose a wireless network FL resource scheduling strategy based on information age awareness, which enables rapid and accurate model training by jointly considering parameter freshness and the heterogeneous capabilities of end devices. We jointly optimize AoI and resource allocation strategies to achieve a trade-off between FL accuracy, training time, and user energy consumption. Considering the potential impact of the current scheduling on subsequent training and available resources, we describe the optimization problem as an MDP.

3.1.1. Local Delay and Age of Local Data

In Figure 2, the time period during which device k waits for FL training requests to complete local training includes idle time, data collection and processing times, and local model parameter training time. In traditional FL, users only start collecting training data and performing local training after receiving FL training requests, as shown in Figure 2a. The time delay for user k to perform local model training is represented by

T_{k}^{t r a} (t)

, while the time delay for device k to collect required data samples is represented by

T_{k}^{c o l} (t)

. Starting from the beginning of data collection and completing it before starting local training can ensure data freshness, but it will result in significant delays, with a corresponding local delay of

T_{k}^{t r a} (t) + T_{k}^{c o l} (t)

.

Suppose the total floating-point operations (FLOPs) needed to process one data sample are denoted by

G_{k}^{p e r}

, and the FLOPs of each CPU cycle of device k are denoted by

C_{k}

. Then, the time required for device k to process one data sample is given by

T_{k}^{p e r} = \frac{G_{k}^{p e r}}{C_{k} f_{k}^{c o m}}

(1)

In this paper, we consider designing a data buffer mechanism for FL processes. Devices should adjust the data collection frequency

f_{k}^{c o l}

based on intelligent algorithms and collect data in intervals of time

f_{k}^{c o l} T_{k}^{p e r}

during idle periods. As shown in Figure 2b,c, this approach can bypass the data collection process and quickly respond to FL while ensuring data freshness, with a local delay of

T_{k}^{t r a} (t)

.

Assuming that device k needs to collect

N_{k}^{p e r}

units of data samples for local training, data collection involves continuously collecting data and sending each group of data sequentially to the data buffer. If device k has completed data collection and has still not received FL training requests after waiting for

f_{k}^{c o l} T_{k}^{p e r}

time, new data will be collected to maintain the freshness of the data in the buffer. According to the First-In-First-Out (FIFO) principle, the latest collected unit of data is sent to the tail of the buffer queue, and the unit of data at the head of the buffer queue will be discarded. As shown in Figure 2b, during the new data collection process, there are always enough data samples available in the local system to quickly respond to FL requests.

To quantify the freshness of each unit of data, device k ages its data every time it passes through a time period

T_{k}^{p e r}

during idle time, which means that

A_{k, n}^{l o c} (t + 1) = A_{k, n}^{l o c} (t) + 1

(2)

where

A_{k, n}^{l o c} (t)

represents the age of device k’s nth group of data at time t. After receiving an FL request and completing local model training, device k uploads the latest local model parameters

ω_{k}

to the central server. During the process of uploading model parameters, the parameter upload rate may not remain efficient and stable due to the dynamic changes in the geographical location and communication environment of device k. Moreover, due to communication instability, the probability of many devices falling behind during parameter upload increases significantly. Therefore, it is necessary to set a receive waiting time threshold

T_{a}

to limit the waiting time for the central server to receive device k’s model. Let

T_{k}^{s e r} (t) = T_{k}^{t r a} (t) + T_{k}^{u p} (t)

. For device k, its model can only be successfully aggregated in this round if

T_{k}^{s e r} (t) T_{a}

. Assuming that device k receives an FL request at time

t_{k}^{f l}

, the sum of all local data ages of device k at this moment is

\sum_{n \in N_{k}^{p e r}} A_{k, n}^{l o c} (t_{k}^{f l})

, and the model age will be calculated on this basis. The optimization objective is given by

A_{k}^{m d l} (t_{k}^{f l}) = \sum_{n \in N_{k}^{p e r}} A_{k, n}^{l o c} (t_{k}^{f l})

(3)

3.1.2. Transmission Delay and Age of Model

Given that the diversity of data samples is beneficial for improving model performance, the freshness of device k’s local model also affects the convergence speed of the FL global model. Therefore, it is necessary to make reasonable use of limited bandwidth resources to enable as many models as possible that meet the age requirements to participate in global aggregation. Each device that responds to an FL request uploads its trained local model to the central server for the aggregation of new global models. Due to dynamic changes in devices’ available computing resources and communication conditions, there is uncertainty in the delay between responding to an FL request and successfully uploading the local model to the central server. Therefore, due to local training and model upload delays, devices may risk missing multiple rounds of global aggregation. The age of user k’s local model

A_{k}^{m d l} (t)

increases with the increase in global iterations until the model is used for global aggregation or discarded due to being too old, which is expressed as

A_{k}^{m d l} (t + T^{f l}) = A_{k}^{m d l} (t) + 1

(4)

where

T^{f l}

is the time interval between two global iterations. Considering the impact of model utilization on its age, this paper sets a model freshness threshold

A_{a}^{m d l}

to exclude models that exceed this threshold from global model aggregation. Combining the constraints on service latency mentioned above, only when both conditions

T_{k}^{s e r} (t) T_{a}

and

A_{k}^{m d l} (t) A_{a}^{m d l}

are satisfied at the same time will

C_{k} (t) = 1

be set; otherwise,

C_{k} (t) = 0

. This can be expressed as

C_{k} (t) = \{\begin{matrix} 1, t_{k}^{l o c} + t_{k}^{u p} \leq T_{a} & A_{k}^{m d l} (t) \leq A_{a}^{m d l}, \\ 0, e l s e . \end{matrix}

(5)

Based on the above analysis, the latest global model age for global aggregation is

A^{g} (t) = \frac{1}{K} \sum_{k \in K} C_{k} (t) A_{k}^{m d l} (t)

(6)

3.2. Energy Consumption Model

Considering that the central server has sufficient resources, we ignore energy consumption during the global model’s downlink process. In our model, energy consumption is mainly incurred during the local computation and model upload processes of smart devices. Given the heterogeneous computing resources and communication conditions of different smart devices, the time taken by each device for the processes of local computation and model upload varies, resulting in different energy consumption. In the following, we model the energy consumption of these two processes separately.

3.2.1. Energy Consumption for Local Training

The calculation frequency of device k can be approximately represented as

f_{k}^{c o m} (t) = \frac{1}{ε_{1, k}} \cdot v_{k} (t)

(7)

where

ε_{1, k}

is the coefficient determined by the chip structure of device k, and

v_{k}

is the instantaneous voltage applied to the chip by device k, with a threshold protection applied to prevent damage to the chip due to high voltages for the purpose of preserving its lifetime, that is,

v_{k} v_{m}

. The power consumed by this device at the given calculation frequency

f_{k}^{c o m}

can be calculated as follows:

P_{k}^{l o c} (t) = ε_{2, k} v_{k}^{2} (t) f_{k}^{c o m} (t) = ε_{k} {(f_{k}^{c o m} (t))}^{3}

(8)

where

ε_{2, k}

is considered a constant determined by physical factors such as the process and architecture of the chip, denoted as

ε_{k} = ε_{2, k} * ε_{1, k}^{2}

, where

ε_{k}

is the energy coefficient. The number of FLOPs for local training for device k during one iteration is represented by

G_{k}^{t r a} (t)

. The latency of local training during one FL iteration when device k participates with the data buffering mechanism can be expressed as follows:

T_{k}^{t r a} (t) = \frac{G_{k}^{t r a} (t)}{C_{k} (t) f_{k}^{c o m} (t)}

(9)

In addition, the computation frequency

f_{k}^{c o m} (t)

of device k during data collection is consistent with that during local training. Let

n_{k}^{f l}

be the total number of collected data samples within the time interval between two global iterations

T^{f l}

for device k. Then, the local-phase energy consumption (including the data collection and model training processes) of these devices can be expressed as follows:

E_{k}^{c o l} (t) = (n_{k}^{f l} T_{k}^{p e r} + T^{t r a} (t)) P_{k}^{l o c} (t)

(10)

3.2.2. Energy Consumption for Model Uploads

Assume that the maximum available bandwidth within the coverage area of the central server is

B_{m}

. If device k is assigned a bandwidth

B_{k} (t)

, its data upload rate can be expressed as follows:

R_{k} (t) = B_{k} (t) {log}_{2} (1 + τ_{k} (t))

(11)

where

τ_{k} (t)

is the signal-to-noise ratio from device k to the central server and

D_{k} (t)

is the size of the local model for device k. The latency of uploading the model for device k can be expressed as follows:

T_{k}^{u p} (t) = \frac{D_{k} (t)}{R_{k} (t)}

(12)

Let the transmission power during the upload process be

p_{k}^{u p} (t)

. Then, the energy consumption of device k for uploading its local model can be expressed as follows:

E_{k}^{u p} (t) = p_{k}^{u p} (t) \frac{D_{k} (t)}{R_{k} (t)}

(13)

The global iteration of the current round only aggregates models that meet the requirements. The aggregated model depends only on the required devices, so the energy consumption generated during this iteration can be calculated by summing the energy consumption of the corresponding devices, represented as

E^{g} (t) = \frac{1}{K} \sum_{k \in K} C_{k} (t) E_{k}^{c o l} (t) E_{k}^{u p} (t)

(14)

3.3. Problem Formulation

We set up a data buffer and model buffer on both the device and server sides, respectively. To ensure the performance of the models, we ensure that the data in each buffer are fresh enough by reducing the service delays in FL [39]. In addition to parameter age and FL service delays, system energy consumption is also an important factor that requires consideration. It can be expressed as

U (t) = A^{g} (t) + E^{g} (t) + T^{s e r} (t)

(15)

The goal is to design frequency adjustment strategies for devices’ data collection, local computation, and bandwidth allocation while coordinating the global parameter age, energy consumption, and FL service delays. Therefore, the optimization problem based on Equation (15) is formulated as follows:

\begin{matrix} min_{f_{k}^{c o l}, f_{k}^{c o m}, B_{k}} U (t) \\ s . t . C 1 : B_{k} (t) > 0, \forall k \in K \\ C 2 : \sum_{k = 1}^{K} B_{k} (t) \leq B_{m} \\ C 3 : f_{k}^{c o m} (t) \leq f_{k}^{m} (t), k \in K \end{matrix}

(16)

where

C 1

ensures that each device has available spectrum resources for uploading its local models.

C 2

requires that the sum of the total allocated spectrum resources to devices does not exceed the maximum available value globally.

C 3

requires that the adjusted computation resources of devices at the current time must not exceed the maximum available computation resources at the same time.

4. Algorithm

Considering the dynamic and high-dimensional nature of many scenarios, it is very difficult to solve problem (16) using traditional methods. In this section, the optimization problem (16) is modeled as an MDP, and deep reinforcement learning models are introduced to solve it. Q-learning and DQN are popular value-based RL methods that learn the action value function Q(s, a) related to the system’s rewards or penalties. However, as the action space grows, finding the best action becomes increasingly difficult. To overcome the complexity of the action space, a policy-based approach is introduced.

PPO is an on-policy RL algorithm that improves the policy gradient (PG) algorithm by retaining its excellent performance in continuous state and action spaces. The PG algorithm is updated by calculating the policy gradient estimate and maximizing the policy value using gradient ascent methods [40]. PPO is parameterized by a set of parameters that parametrize the policy, replacing deterministic policies in value-based reinforcement learning with probabilistic distributions. Actions are sampled from the return probability distribution and optimized using neural network methods. Developed in 2017, it is based on the Additive Advantage Actor-Critic (A3C) algorithm. The core idea of PPO is to limit the policy update range within a certain offset to avoid unstable problems caused by large or small updates, effectively balancing simplicity, sample complexity, and tuning difficulty.

Considering the complexity and high dynamics of the scenario in this article, the PPO algorithm is used to solve the MDP problem and optimize the objective. The algorithm flow is shown in Figure 3. In the proposed algorithm, FL resources in the WCPN are scheduled and optimized. The algorithm flow is as follows:

$Step 1 :$ The WCPN sends the environment state to the PPO agent.
$Step 2 :$ The PPO agent outputs the optimal decision and evaluation value based on the current state and then returns the decision to the WCPN environment for execution.
$Step 3 :$ The WCPN environment returns a reward after executing the decision.
$Step 4 :$ The PPO agent stores relevant data in the experience replay buffer.
$Step 5 :$ Once the experience replay buffer contains a sufficient amount of data, the parameters of the Actor and Critic networks in the PPO agent are updated.

4.1. MDP Modeling for Optimization Problems

An MDP is a commonly used model for designing environmental interaction systems and is widely applied in reinforcement learning and dynamic decision making. In an MDP, an agent learns the optimal strategy through interaction with the environment to maximize its expected return.

The core idea of an MDP is to model the problem as a dynamic process where an agent takes actions in a constantly changing environment and adjusts its behavior based on feedback from the environment. However, the environment in an MDP can be uncertain, meaning that an agent may need to take different actions in response to different situations.

4.1.1. State

The state

S (t)

includes the amount of data each device still needs to collect upon receiving an FL request, the amount of data each device has collected since its last response to an FL request, the signal-to-noise ratio (SINR) at the current moment, and the time interval between the last two FL requests. Therefore, the state space at decision epoch t is given by

S (t) = {w_{k}^{n}, w^{n}, τ_{k} (t), T^{f l}}

(17)

4.1.2. Action

Based on the current environmental state

S (t)

, the action

A (t)

adjusts the frequency of data collection for all devices

k \in K

, the local computation frequency, and the spectrum resources allocated for uploading the local model, i.e.,

f_{k}^{c o l} (t), f_{k}^{c o m} (t), B_{k} (t), k \in K

. Similarly, the action at decision epoch t is given by

A (t) = {f_{k}^{c o l} (t), f_{k}^{c o m} (t), B_{k} (t)}

(18)

4.1.3. Reward

Considering the age of data samples and model parameters during the FL iteration process, along with the trade-off between computational energy consumption, transmission energy consumption, and service latency, we aim to minimize the age while ensuring that energy consumption and latency are within acceptable ranges. Computing and spectrum resources should not be sacrificed to reduce parameter age at the cost of a large number of data samples and model parameters. In combination with our optimization objective (16), the goal of the reinforcement learning agent is to maximize the expected reward

r (t)

. Therefore, the reward of the agent can be expressed as the negative form of (15), which is

r (t) = - U (t)

(19)

4.2. Double-Cache FL Scheduling Algorithm for Parameter Age Based on PPO

The PPO algorithm is a reinforcement learning algorithm based on policy gradient that aims to optimize a policy by finding the optimal set of parameters

θ^{*}

and making the best decisions to maximize the expected returns. The neural network of PPO consists of an Actor network and a Critic network. The Critic network is a value function network that estimates the value of each state, while the Actor network calculates the probability distribution of actions under different states.

To prevent overshooting, where the policy falls into a worse-performing region and fails to provide sufficient effective information in the future, the PPO algorithm also quantifies the optimization of the policy to ensure that each update has some degree of effectiveness. By updating the policy parameters using the advantage function, a new policy can be improved. According to the advantage estimation method proposed by Mnih et al. [41], the Actor is run within a minibatch of logical decision slots, and samples are collected for advantage estimation, which can be denoted as

\hat{A} (t) = δ (t) + γ δ (t + 1) + . . . + γ^{T - t + 1} δ (T - 1)

(20)

where

γ

is the discount factor,

V (t) = Q (S (t); θ_{c})

is the state value function, and

δ (t) = R (t) + γ V (t + 1) - V (t)

is the advantage function. To make up for the shortcomings of the PG algorithm, the PPO algorithm incorporates off-policy updates to ensure that a set of data obtained through sampling can be reused. At the same time, a clipped alternative objective ensures that the change in magnitude between new and old policies remains controlled. Let

r_{t} (θ) = \frac{π_{θ} (A (t) | S (t))}{π_{θ_{o l d}} (A (t) | S (t))}

represent the difference between policy

π_{θ} (A (t) | S (t))

before and after updating. The objective function of PPO can be expressed as

L (θ_{a}) = {\hat{E}}_{t} [min (r_{t} (θ_{a}) \hat{A} (t), c l i p (r_{t} (θ_{a}), 1 - ε, 1 + ε) \hat{A} (t))]

(21)

where

ε

is a hyperparameter used to limit the magnitude of policy updates. The clip function, which clips

r_{t} (θ_{a})

to the range

[1 - ε, 1 + ε]

, restricts the magnitude of policy updates. Additionally, the optimization objective of the Critic network is to minimize the difference between the estimated value and the actual reward, with its loss function given by

L (θ_{c}) = {\hat{E}}_{t} (| | V (t) - R (t) | |^{2})

(22)

During the training process of the PPO algorithm, the agent obtains the initial state

S (0)

from the environment. Based on the current state, it generates a random policy distribution, samples an action to apply to the FL of the WCPN, and obtains the new state and action transition sequence. At each minibatch step, the alternative loss is constructed using the collected data, and the Actor-Critic network parameters

θ_{a}

and

θ_{c}

are updated using the Adam optimizer to obtain the optimal policy parameters

θ^{*}

. Algorithm 1 shows the pseudocode for the parameter age-aware PPO double-buffer FL scheduling algorithm.

Algorithm 1: Parameter age-aware PPO double-buffer FL scheduling algorithm

5. Experiment

This section evaluates the performance of the proposed parameter age-aware PPO double-buffer FL scheduling algorithm through experimental testing and establishes multiple comparison schemes to verify the correctness and superiority of the proposed scheme.

5.1. Experimental Settings

In this paper, the PPO algorithm is implemented using the TensorFlow neural network framework. The Actor network consists of an input layer, three hidden layers, and one output layer, while the Critic network consists of an input layer, two hidden layers, and one output layer. The PPO algorithm is set with a clipping parameter of

ϵ = 0.2

, a learning rate for the Actor network of

l r_a c t o r = 1 \times 10^{- 4}

, a learning rate for the Critic network of

l r_c r i t i c = 2 \times 10^{- 4}

, and a discount factor of

γ = 0.99

. The maximum number of training rounds in RL is

M A X_E P O S I D E = 3000

, each round consists of

e p o c h = 64

iterations, and the minibatch size in the PPO algorithm is 64.

In addition, we validate the FL performance of the age-aware PPO double-buffer scheduling algorithm using a real dataset: MNIST. The MNIST dataset consists of handwritten digit images and labels from 0 to 9. It includes 60,000 training images and 10,000 testing images. During simulation validation, the devices use convolutional neural networks as the machine learning model for FL. They randomly collect data samples from the MNIST dataset and conduct local training. While the MNIST dataset is a simplified benchmark, its heterogeneity mimics non-IID data distributions in real-world edge devices. Future work will validate the framework on complex datasets (e.g., CIFAR-10) and industrial IoT streams. Following the introduction in Section 2, this paper assumes that FL request arrivals follow a Poisson distribution. Considering a scenario where a central server and

K = 6

devices jointly complete the FL task, the main environmental parameters are defined in Table 3.

5.2. Numerical Results

We establish multiple comparison schemes to verify the correctness and superiority of the proposed solution from different perspectives through simulations. There are a total of six different comparison schemes, as follows: (i) OnlyModelCache: Only set the model cache buffer in the central server to store models for dropped devices; (ii) OnlyLocalCache: Only set the local data cache buffer for each device to pre-cache data samples needed for local training; (iii) WithoutAnyCache: Do not set any cache buffers, which corresponds to FL in a general scenario; (iv) FixFcol: Fix the device’s data collection frequency and optimize only its computation frequency and allocated bandwidth; (v) FixFcom: Fix the ratio of the device’s computation frequency allocated to data collection and local computing with respect to the maximum available computation frequency, and optimize only its data collection frequency and allocated bandwidth; and (vi) FixBandwidth: Fix the device’s allocated bandwidth and optimize only its data collection and computation frequencies.

As shown in Figure 4, as the number of iterations increases, all four schemes can effectively reduce FL service delays. The proposed scheme reduces FL service latency to 33 s after 150 iterations, achieving a 36.4% improvement over the no-cache baseline (55 s) and a 17.5% improvement over the OnlyLocalCache scheme (40.5 s). From reward Equation (19), it can be seen that the FL latency is also an important objective in optimization. As the PPO agent optimizes for maximum reward, it simultaneously minimizes FL latency, aligning with this paper’s goal of reducing FL latency. The two schemes with local data caching buffers have the lowest FL service delays. The performance of the OnlyModelCache and WithoutAnyCache schemes is almost the same, indicating that model caching does not have a negative impact on FL service delays. The simulation results demonstrate the effectiveness of data caching buffers in reducing FL service delays. Although adding a model caching buffer does not significantly affect FL service latency, it provides more device-trained machine learning models for the global model, enhancing its performance and generalization ability. Additionally, local data caching eliminates data collection delays during FL requests, directly reducing latency. Model caching indirectly mitigates delays by allowing the server to reuse stale models from lagging devices.

Figure 5 shows the relationship between the agent’s reward and the number of iterations for four schemes. The proposed scheme achieves a reward value of 67 at convergence, outperforming the OnlyLocalCache (64, 8.3% improvement) and WithoutAnyCache (55, 18.2% improvement) schemes. Since the proposed scheme and the local data caching scheme significantly reduce FL latency, these two schemes have obvious advantages in terms of rewards. Although the OnlyModelCache and WithoutAnyCache schemes collect data only after receiving FL requests, their data are the freshest, and the age of the parameters involved in local training is very low. However, the performance of these two schemes in terms of FL service delays is worse, as fresh data cannot compensate for the negative impact of high service delays, resulting in lower rewards. The reward metric balances parameter freshness, energy consumption, and latency. The OnlyModelCache scheme underperforms due to its inability to reduce data collection delays, highlighting the necessity of local data caching.

As shown in Figure 6, as the number of iterations increases, the rewards for all four optimization schemes gradually increase and eventually converge. According to the definition in Equation (19), the optimization objective defined by Equation (15) is a negative exponential function of the reward, which means that the maximum reward of the agent corresponds to the minimization of Equation (15). This aligns with this paper’s goal of minimizing FL latency. Compared with the other three schemes, the proposed scheme achieves the highest reward. As the proposed scheme includes more optimization variables, its convergence speed is slightly slower, but its reward value is significantly higher than that of the other schemes. The other schemes lack optimization of essential variables, and although they also converge to relatively high rewards, their final results are significantly worse than those of the proposed scheme. In conclusion, the joint optimization of the three variables proposed in this paper is reasonable, valid, and significantly superior.

As shown in Figure 7, as the iteration number increases, all four optimization schemes gradually reduce the global parameter age until convergence. The global parameter age (Figure 7) decreases from 160 to 110 in the proposed scheme. In contrast, for the FixFcol scheme, it starts at around 200 and drops to about 150. The FixFcom scheme begins at roughly 275 and decreases to around 180. The FixBandwidth scheme starts at about 225 and remains relatively stable at around 200. This highlights the superior performance of the proposed scheme in reducing the global parameter age compared to the other non-optimized schemes. Parameter age is an important indicator of the agent’s convergence performance, serving as an optimization objective in Equation (15). Similarly, due to the greater number of optimization variables in the proposed scheme, its corresponding parameter age decreases at a slower convergence rate, but it can still reduce the global parameter age to a very small value. A fixed collection frequency at a higher value can maintain a lower local data parameter age, but this scheme cannot make adaptive adjustments to the randomness of FL requests, so its parameter age is slightly smaller than that of the proposed scheme. In addition, this results in higher energy consumption, which explains why its parameter age is lower than that of the fixed computation frequency scheme, while its reward remains similar. Fixing a device’s computation frequency makes the time spent by each device collecting unit data relatively constant, but since FL requests arrive randomly, they require a comprehensive evaluation of overall system energy consumption. The adaptive adjustment of data collection frequency is limited, restricting it to a lower collection frequency and resulting in a higher data sample age. Overall, adjusting either data collection frequency or computation frequency alone cannot effectively reduce the local sample parameter age, and both need to be dynamically adjusted together to achieve optimal results. Furthermore, fixing the bandwidth size allocated to each device is influenced by stragglers under highly dynamic communication conditions, causing the central server to wait for the straggler to complete FL service before aggregation, during which all devices’ parameter ages increase simultaneously, leading to extremely poor overall parameter age performance.

Figure 8 shows box plots of the global parameter age under four schemes, including the proposed scheme, the FixFcol scheme, the FixFcom scheme, and the FixBandwidth scheme. In the figure, it can be seen that the proposed scheme performs the best in terms of global parameter age minimization and exhibits relatively good overall stability, with a smaller range for the upper and lower bounds of the global parameter age. The proposed scheme achieves a median global parameter age of 114, which is notably lower than that of the FixBandwidth scheme with a median of 205, the FixFcom scheme with a median of 170, and the FixFcol scheme with a median of 150. This demonstrates that the proposed scheme not only achieves the best performance in global parameter age optimization but also exhibits good stability, further validating its correctness and superiority. The reason is that adaptive resource scheduling ensures stable parameter freshness, whereas fixed strategies suffer from outdated models due to stragglers.

Figure 9 and Figure 10 compare the FL accuracy and loss under four schemes, including the proposed scheme, the OnlyModelCache scheme, the OnlyLocalCache scheme, and the WithoutAnyCache scheme. In the figures, it can be seen that these four schemes gradually increase FL accuracy and converge as the number of global iterations increases. The proposed scheme achieves a 98.2% accuracy at 100 iterations (vs. 96.5% for the WithoutAnyCache scheme) and converges faster (97% accuracy at 50 iterations). The final loss is 0.08 (vs. 0.12 for the OnlyModelCache scheme), demonstrating that a low parameter age enhances gradient coherence and model convergence. The convergence trends of these schemes vary slightly due to outdated data reducing the global model’s convergence speed. As analyzed in [26], due to gradient coherence, if the current gradient aligns with the gradient direction of the recent period, the update is valid; however, if the current update direction is opposite to the historical direction, one of the updates (either the current or historical) is incorrect, hindering model convergence. The proposed scheme converges the fastest because it achieves the lowest global parameter age, which facilitates global model convergence. The other three comparison schemes converge relatively slowly, but with increasing iterations, these schemes all converge to roughly the same level.

6. Conclusions

In this paper, we consider the scenario of random FL tasks with high computational demands and service quality requirements and propose a parametric age-aware WCPN FL resource scheduling strategy. Considering the various service demands of FL, we design a data caching mechanism to improve global model performance by optimizing devices’ data collection frequency, computation frequency, and spectrum resources. The dual-cache mechanism is practical in time-sensitive applications like autonomous vehicles and smart factories. For example, local data caching enables sensors to pre-process high-frequency telemetry during idle periods, while server model caching allows delayed updates from mobile devices without blocking global aggregation. We comprehensively consider parameter staleness, time-varying channels, the random arrival of FL requests, and heterogeneous computing capabilities among participating devices, and use the PPO algorithm to solve the optimization objective. Numerical results show that our proposed scheme effectively balances system benefits in terms of the global parameter age, energy consumption, and service latency, resulting in maximum system efficiency. Despite the achievements of our proposed federated learning scheduling strategy, its experimental validation has limitations. Using only the simple MNIST dataset does not fully assess the algorithm’s performance. In the future, we will employ more diverse datasets like CIFAR-10/100 and text datasets. Additionally, exploring more realistic request arrival patterns will improve the algorithm’s practicality and effectiveness in real-world wireless computing networks.

Author Contributions

Conceptualization, X.Z.; Software, X.Z.; Formal analysis, C.L. and Z.X.; Investigation, X.Z. and C.L.; Resources, Y.L. and L.J.; Writing—original draft, X.Z.; Writing—review & editing, L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the NSFC Programs under grant 62371142 and grant 62273107, and in part by the Guangdong Basic and Applied Basic Research Foundation under grant 2024A1515010404.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lu, Y.; Zheng, X. 6G: A survey on technologies, scenarios, challenges, and the related issues. J. Ind. Inf. Integr. 2020, 19, 100158. [Google Scholar] [CrossRef]
Sun, W.; Lei, S.; Wang, L.; Liu, Z.; Zhang, Y. Adaptive federated learning and digital twin for industrial internet of things. IEEE Trans. Ind. Inform. 2020, 17, 5605–5614. [Google Scholar] [CrossRef]
Qu, Y.; Dong, C.; Zheng, J.; Dai, H.; Wu, F.; Guo, S.; Anpalagan, A. Empowering edge intelligence by air-ground integrated federated learning. IEEE Netw. 2021, 35, 34–41. [Google Scholar] [CrossRef]
McMahan, H.B.; Yu, F.X.; Richtarik, P.; Suresh, A.T.; Bacon, D.; Konečný, J. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. arXiv 2016, arXiv:1602.05629. [Google Scholar]
Lin, Y.; Han, S.; Mao, H.; Wang, Y.; Dally, W.J. Deep gradient compression: Reducing the communication bandwidth for distributed training. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Letaief, K.B.; Chen, W.; Shi, Y.; Zhang, J.; Zhang, Y.J.A. The roadmap to 6G:CAI empowered wireless networks. IEEE Commun. Mag 2019, 57, 84–90. [Google Scholar] [CrossRef]
Wang, P.; Sun, W.; Zhang, H.; Ma, W.; Zhang, Y. Distributed and secure federated learning for wireless computing power networks. IEEE Trans. Veh. Technol. 2023, 72, 9381–9393. [Google Scholar] [CrossRef]
Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
Ha, S.; Zhang, J.; Simeone, O.; Kang, J. Coded federated computing in wireless networks with straggling devices and imperfect CSI. arXiv 2019, arXiv:1901.05239. [Google Scholar]
Reisizadeh, A.; Tziotis, I.; Hassani, H.; Mokhtari, A.; Pedarsani, R. Straggler-resilient federated learning: Leveraging the interplay between statistical accuracy and system heterogeneity. arXiv 2020, arXiv:2012.14453. [Google Scholar] [CrossRef]
Chen, M.; Poor, H.V.; Saad, W.; Cui, S. Convergence Time Minimization of Federated Learning over Wireless Networks. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
Huang, X.; Leng, S.; Maharjan, S.; Zhang, Y. Multi-agent deep reinforcement learning for computation offloading and interference coordination in small cell networks. IEEE Trans. Veh. Technol. 2021, 70, 9282–9293. [Google Scholar] [CrossRef]
Balasubramanian, V.; Aloqaily, M.; Reisslein, M. FedCo: A federated learning controller for content management in multi-party edge systems. In Proceedings of the 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece, 19–22 July 2021; pp. 1–9. [Google Scholar]
Sun, W.; Li, Z.; Wang, Q.; Zhang, Y. FedTAR: Task and Resource-Aware Federated Learning for Wireless Computing Power Networks. IEEE Internet Things J. 2023, 10, 4257–4270. [Google Scholar] [CrossRef]
Liu, Y.; Chang, Z.; Min, G.; Mao, S. Average age of information in wireless powered mobile edge computing system. IEEE Wirel. Commun. Lett. 2022, 11, 1585–1589. [Google Scholar] [CrossRef]
Zhang, G.; Zheng, Y.; Liu, Y.; Hu, J.; Yang, K. Resource Scheduling for Timely Wireless Powered Crowdsensing with the Aid of Average Age of Information. In Proceedings of the ICC 2024—IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; pp. 4161–4166. [Google Scholar] [CrossRef]
Zhu, J.; Gong, J. Optimizing Peak Age of Information in MEC Systems: Computing Preemption and Non-Preemption. IEEE/ACM Trans. Netw. 2024, 32, 3285–3300. [Google Scholar] [CrossRef]
Vineeth, B.S.; Thomas, R.C. On the Average Age-of-Information for Hybrid Multiple Access Protocols. IEEE Netw. Lett. 2022, 4, 87–91. [Google Scholar] [CrossRef]
Moltafet, M.; Leinonen, M.; Codreanu, M. Average Age of Information for a Multi-Source M/M/1 Queueing Model with Packet Management and Self-Preemption in Service. In Proceedings of the 2020 18th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Volos, Greece, 15–19 June 2020; pp. 1–5. [Google Scholar]
Zheng, Y.; Hu, J.; Yang, K. Average Age of Information in Wireless Powered Relay Aided Communication Network. IEEE Internet Things J. 2022, 9, 11311–11323. [Google Scholar] [CrossRef]
Hui, E.T.Z.; Madhukumar, A.S. Mean Peak Age of Information Analysis of Energy-Aware Computation Offloading in IIoT Networks. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), Singapore, 24–27 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, Y.; Jiang, Y.; Zhu, X.; Cao, J.; Sun, S. Optimized Age of Information for Relay Systems with Resource Allocation. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), Singapore, 24–27 June 2024; pp. 1–5. [Google Scholar] [CrossRef]
Zhu, J.; Gong, J. Optimizing Peak Age of Information in Mobile Edge Computing. In Proceedings of the 2023 35th International Teletraffic Congress (ITC-35), Turin, Italy, 3–5 October 2023; pp. 1–9. [Google Scholar] [CrossRef]
Wang, X.; Ning, Z.; Guo, S.; Wen, M.; Poor, V. Minimizing the Age of-critical-information: An imitation learning-based scheduling approach under partial observations. IEEE Trans. Mobile Comput. 2021. early access. [Google Scholar] [CrossRef]
Dai, W.; Zhou, Y.; Dong, N.; Zhang, H.; Xing, E.P. Toward understanding the impact of staleness in distributed machine learning. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; pp. 1–6. [Google Scholar]
Buck, D.; Singhal, M. An analytic study of caching in computer systems. J. Parallel Distrib. Comput. 1996, 32, 205–214. [Google Scholar] [CrossRef]
Fu, F.; Miao, X.; Jiang, J.; Xue, H.; Cui, B. Towards communication-efficient vertical federated learning training via cache-enabled local updates. arXiv 2022, arXiv:2207.14628. [Google Scholar] [CrossRef]
Wu, Z.; Sun, S.; Wang, Y.; Liu, M.; Xu, K.; Wang, W.; Jiang, X.; Gao, B.; Lu, J. Fedcache: A knowledge cache-driven federated learning architecture for personalized edge intelligence. IEEE Trans. Mob. Comput. 2024, 23, 9368–9382. [Google Scholar] [CrossRef]
Liu, Y.; Su, L.; Joe-Wong, C.; Ioannidis, S.; Yeh, E.; Siew, M. Cache-Enabled Federated Learning Systems. In Proceedings of the Twenty-Fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, Washington, DC, USA, 23–26 October 2023; pp. 1–11. [Google Scholar]
Liu, X.; Qin, X.; Chen, H.; Liu, Y.; Liu, B.; Zhang, P. Age-aware Communication Strategy in Federated Learning with Energy Harvesting Devices. In Proceedings of the 2021 IEEE/CIC International Conference on Communications in China (ICCC), Xiamen, China, 28–30 July 2021; pp. 358–363. [Google Scholar] [CrossRef]
Meng, Q.; Lu, H.; Qin, L. Energy Optimization in Statistical AoI - Aware MEC Systems. IEEE Commun. Lett. 2024, 28, 2263–2267. [Google Scholar] [CrossRef]
Zhu, Z.; Wan, S.; Fan, P.; Letaief, K.B. Federated Multiagent Actor-Critic Learning for Age Sensitive Mobile-Edge Computing. IEEE Internet Things J. 2022, 9, 1053–1067. [Google Scholar] [CrossRef]
Xu, J.; Jia, X.; Hao, Z. Research on Information Freshness of UAV-assisted IoT Networks Based on DDQN. In Proceedings of the 2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 16–18 December 2022; pp. 427–433. [Google Scholar] [CrossRef]
Dong, L.; Zhou, Y.; Liu, L.; Qi, Y.; Zhang, Y. Age of Information Based Client Selection for Wireless Federated Learning With Diversified Learning Capabilities. IEEE Trans. Mob. Comput. 2024, 23, 14934–14945. [Google Scholar] [CrossRef]
Xiao, X.; Wang, X.; Lin, W. Joint AoI-Aware UAVs Trajectory Planning and Data Collection in UAV-Based IoT Systems: A Deep Reinforcement Learning Approach. IEEE Trans. Consum. Electron. 2024, 70, 6484–6495. [Google Scholar] [CrossRef]
Hsu, Y.-L.; Liu, C.-F.; Wei, H.-Y.; Bennis, M. Optimized Data Sampling and Energy Consumption in IIoT: A Federated Learning Approach. IEEE Trans. Commun. 2022, 70, 7915–7931. [Google Scholar] [CrossRef]
Zhang, S.; Li, J.; Luo, H.; Gao, J.; Zhao, L.; Shen, X.S. Towards fresh and low-latency content delivery in vehicular networks: An edge caching aspect. In Proceedings of the 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–6. [Google Scholar]
Dai, W.; Zhou, Y.; Dong, N.; Zhang, H.; Xing, E.P. Toward Understanding the Impact of Staleness in Distributed Machine Learning. arXiv 2018, arXiv:1810.03264. [Google Scholar]
Bie, T.; Zhu, X.Q.; Fu, Y.; Li, X.; Ruan, X.; Wang, Q. Safety priority path planning method based on Safe-PPO algorithm. J. Beijing Univ. Aeronaut. Astronaut. 2021, 49, 1–15. [Google Scholar] [CrossRef]
Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]

Figure 1. Decentralized edge federated modeling framework in a WCPN.

Figure 2. FL service delays in different cases: (a) Traditional FL without caching; (b) Local data caching; (c) Server model caching.

Figure 3. Algorithm flow in the system.

Figure 4. Comparison of FL service latency across different cache schemes, demonstrating the impact of local data caching and model caching on reducing service delays.

Figure 5. Relationship between RL rewards and iterations across different cache schemes, highlighting the advantage of reducing service delays in improving rewards.

Figure 6. Comparison of RL rewards across different optimization variables, demonstrating the superiority of the joint optimization scheme in achieving higher rewards.

Figure 7. Variation in global AoI over iterations for different optimization variables: analyzing the impact of various optimization strategies on reducing the global parameter age.

Figure 8. Box plots of AoI under different optimization variables, illustrating the optimal performance and stability of the proposed scheme in minimizing the global parameter age.

Figure 9. Comparison of FL accuracy across different optimization variables: investigating the influence of optimization strategies on model convergence and accuracy improvement.

Figure 10. Comparison of federated learning loss across different optimization variables: analyzing the convergence trends of different schemes and the advantage of the proposed scheme in achieving faster convergence.

Table 1. General comparison between related studies and our work.

Previous Works	Optimization—Target Focus	Method	Performance in Service Delays	Performance in Parameter Age	Performance in Energy Consumption
[31]	Mainly on preventing device dropout for parameter-age optimization and less on delays and energy	Energy-harvesting technology	Limited optimization	Good for preventing dropout for age but not comprehensive	Limited consideration
[32]	Mainly on AoI tail-distribution analysis and energy minimization and less on FL service delays	Random network calculus (SNC)	Less attention	Focuses on AoI analysis	Focuses on energy minimization
[33]	AoI minimization	Hybrid strategy-based multi-modal DRL framework	Does not fully consider FL system dynamics	Focuses on AoI	Does not fully consider energy
[37]	AoI-related optimization with DRL	DRL-based approach for AoI	Focuses on AoI, not on FL service delays	Focuses on AoI	Focuses on AoI; not energy-centric
Our Work	Multi-objective optimization of global parameter age, energy consumption, and FL service delays	Modeled as MDP and used PPO algorithm and double-cache mechanism	Significant reduction	Lower final global parameter age	Effective energy reduction

Table 2. Notations.

Symbol	Definition
K	Number of smart devices
N	Number of iterations for model training or local data buffer length (context-dependent)
$A_{a}^{m d l}$	Threshold age of model
$T_{a}$	Threshold delay for server to receive model
$ε_{1, k}$	Coefficient determined by the chip structure of device k
$v_{k} (t)$	Instantaneous voltage applied to the chip by device k at time t
$v_{m}$	Voltage threshold to protect the chip
$ε_{2, k}$	Constant determined by physical factors of the chip
$ε_{k}$	$ε_{2, k} * ε_{1, k}^{2}$
$G_{k}^{p e r}$	Total floating-point operations (FLOPs) needed to process one data sample for device k
$C_{k}$	FLOPs of each CPU cycle of device k
$f_{k}^{c o m} (t)$	Computation frequency of device k at time t
$T_{k}^{p e r}$	Time for device k to process one data sample, $T_{k}^{p e r} = \frac{G_{k}^{p e r}}{C_{k} f_{k}^{c o m}}$
$f_{k}^{c o l}$	Data collection frequency of device k
$n_{k}^{f l}$	Total number of collected data samples within the time interval between two global iterations $T^{f l}$ for device k
$T_{k}^{c o l} (t)$	Time delay for device k to collect required data samples at time t
$T_{k}^{t r a} (t)$	Time delay for user k to perform local model training at time t
$T_{k}^{s e r} (t)$	$T_{k}^{t r a} (t) + T_{k}^{u p} (t)$ , total service time for device k including local training and model upload at time t
$T_{k}^{u p} (t)$	Time delay for device k to upload its local model at time t, $T_{k}^{u p} (t) = \frac{D_{k} (t)}{R_{k} (t)}$
$D_{k} (t)$	Size of the local model of device k at time t
$B_{m}$	Global maximum available bandwidth
$B_{k} (t)$	Bandwidth assigned to device k at time t
$R_{k} (t)$	Data upload rate of device k at time t, $R_{k} (t) = B_{k} (t) {log}_{2} (1 + τ_{k} (t))$
$τ_{k} (t)$	Signal-to-noise ratio from device k to the central server at time t
$P_{k}^{l o c} (t)$	Power consumed by device k during local training at time t, $P_{k}^{l o c} (t) = ε_{k} {(f_{k}^{c o m} (t))}^{3}$
$E_{k}^{c o l} (t)$	Local-phase energy consumption (including data collection and model training processes) of device k at time t
$E_{k}^{u p} (t)$	Energy consumption of device k uploading its local model at time t, $E_{k}^{u p} (t) = p_{k}^{u p} (t) \frac{D_{k} (t)}{R_{k} (t)}$
$E^{g} (t)$	Energy consumption generated during the current global iteration
$A_{k, n}^{l o c} (t)$	Freshness of device k’s nth group of data at time t
$A_{k}^{m d l} (t)$	Age of user k’s local model at time t
$A^{g} (t)$	Latest global model age for global aggregation at time t, $A^{g} (t) = \frac{1}{K} \sum_{k \in K} C_{k} (t) A_{k}^{m d l} (t)$
$C_{k} (t)$	Indicator variable, $C_{k} (t) = \{\begin{matrix} 1, & t_{k}^{l o c} + t_{k}^{u p} \leq T_{a} & A_{k}^{m d l} (t) \leq A_{a}^{m d l} \\ 0, & else \end{matrix}$
$U (t)$	Optimization objective function, $U (t) = A^{g} (t) + E^{g} (t) + T^{s e r} (t)$
$S (t)$	State of the environment at decision epoch t, $S (t) = {w_{k}^{n}, w^{n}, τ_{k} (t), T^{f l}}$
$A (t)$	Action at decision epoch t, $A (t) = {f_{k}^{c o l} (t), f_{k}^{c o m} (t), B_{k} (t)}$
$r (t)$	Reward at time t, $r (t) = - U (t)$
$θ$	Set of parameters for PPO agent’s policy
$θ_{a}$	Actor network parameters in PPO agent
$θ_{c}$	Critic network parameters in PPO agent
$γ$	Discount factor in PPO algorithm
$\hat{A} (t)$	Advantage function in PPO algorithm
$L (θ_{a})$	Actor network’s objective function in PPO algorithm
$L (θ_{c})$	Critic network’s loss function in PPO algorithm
$ε$	Hyperparameter used to limit the magnitude of policy updates in PPO algorithm

Table 3. Main parameter settings for simulations.

Parameter	Description	Value
K	Number of devices	6
N	Local data buffer length	1000
$A_{a}^{m d l}$	Threshold age of model	150
$T_{a}$	Threshold delay for server receive model	30 s
$ε_{k}$	Device k’s chip structure coefficient	[0.38988, 0.60998]
$C_{k} (t)$	Device k’s chip clock frequency	[20, 50] FLOPS
$f_{k}^{m} (t)$	The maximum available computing resources of device k at time t	[2.0, 5.0] GHz
$τ_{k} (t)$	SINR between device k and server at time t	[120, 130]
$B_{m}$	Global maximum available bandwidth	100 Mbps

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuang, X.; Luo, C.; Xie, Z.; Li, Y.; Jiang, L. Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks. Electronics 2025, 14, 663. https://doi.org/10.3390/electronics14040663

AMA Style

Zhuang X, Luo C, Xie Z, Li Y, Jiang L. Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks. Electronics. 2025; 14(4):663. https://doi.org/10.3390/electronics14040663

Chicago/Turabian Style

Zhuang, Xiaochong, Chuanbai Luo, Zhenghao Xie, Yu Li, and Li Jiang. 2025. "Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks" Electronics 14, no. 4: 663. https://doi.org/10.3390/electronics14040663

APA Style

Zhuang, X., Luo, C., Xie, Z., Li, Y., & Jiang, L. (2025). Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks. Electronics, 14(4), 663. https://doi.org/10.3390/electronics14040663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks

Abstract

1. Introduction

1.1. Related Works

1.2. Contributions and Paper Organization

2. Notations

3. System Model and Problem Formulation

3.1. AoI and Service Latency Model

3.1.1. Local Delay and Age of Local Data

3.1.2. Transmission Delay and Age of Model

3.2. Energy Consumption Model

3.2.1. Energy Consumption for Local Training

3.2.2. Energy Consumption for Model Uploads

3.3. Problem Formulation

4. Algorithm

4.1. MDP Modeling for Optimization Problems

4.1.1. State

4.1.2. Action

4.1.3. Reward

4.2. Double-Cache FL Scheduling Algorithm for Parameter Age Based on PPO

5. Experiment

5.1. Experimental Settings

5.2. Numerical Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI